[Info-vax] Looking for some text search ideas

Johnny Billquist bqt at softjar.se
Sun Sep 28 07:34:10 EDT 2014


On 2014-09-28 01:26, David Froble wrote:
> Jan-Erik Soderholm wrote:
>
>> I'm quite sure that regexp is *not* the answer for actual issues.
>> It is not that complex, and you'd put the burdon on the user to
>> come up with the actual regexp. That does not work, of course.
>> I'm sure it will all end up in some simply test search
>> sequantial over the data/file/table...
>>
>> Jan-Erik.
>
> It's already in production.
>
> I didn't check the code, but I'd guess it's something like:
>
> Assuming the data is in 3 arrays Mfg$(), Part$(), and Desc$()
>
> And the list is a RECORD variable array called List with elements Mfg$
> and Part$
>
> And the search masks are Mask1$, Mask2$, and Mask3$
>
> ! Loop through the array
>
> For I% = 1% to MaxRec%
>
>      ! If a mfg code is specified, skip unless a match
>
>      Iterate if Mfg$ <> "" and Mfg$ <> MFG$(I%)
>
>      ! Search the description
>
>      Iterate    Unless    InStr( 1% , Desc$(I%) , Mask1$ ) &
>          Or    Instr( 1% , Desc$(I%) , Mask2$ ) &
>          Or    Instr( 1% , Desc$(I%) , Mask3$ )
>
>      ! Found one, add to list
>
>      J% = J% + 1%
>      List(J%)::Mfg$ = Mfg$(I%)
>      List(J%)::Part$ = Part$(I%)
>
> Next I%
>
> Now, I'd want to allow for a variable number of search masks, and
> respect for "AND" and "OR" which would mean a bit more code, but the
> example shows that if you got enough memory and CPU it's a rather simple
> thing to do.
>
> Where I'd look for some efficiency is in checking the description for
> multiple masks in one pass, but, I don't think that would be possible,
> even if some tool made it seem it was doing it.  When you get down to
> the actual machine instructions, I'm betting that it would do one mask
> at a time.

I'm not sure I understand your explanation. What are the search masks? 
The different words that a user entered, and you want them all to appear 
in the description, but they are allowed to appear in any order?

If you were to place the masks in an array instead, you could accomplish 
the check with a loop.

Using a regexp, you could do it all in one line, and the match can be 
generated by the computer, given a list of masks that you are looking 
for. The regexp will not be pretty, but honestly, who would care. It 
would not be meant for looking at by a human.

Essentially, the mask would look something like this:
(Let's say we have two mask words, "foo" and "bar")

(.*foo.*bar.*)|(.*bar.*foo.*)

Now, either that will match, in which case you have both words, or it 
will not match, in which case both words do not appear.
Expanding it to three words is not hard, but it grows fast:

(.*foo.*bar.*xxx.*)|(.*foo.*xxx.*bar.*)|(.*bar.*foo.*xxx.*)|(.*bar.*xxx.*foo.*)|(.*xxx.*foo.*bar.*]|(.*xxx.*bar.*foo.*)

Essentially, you just list all permutations of the words you have, with 
the right characters in between. Creating this match string is fairly 
trivial using a computer. All you feed in are the words you are looking 
for, and out you get the string.

	Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol



More information about the Info-vax mailing list