[Info-vax] Looking for some text search ideas

David Froble davef at tsoft-inc.com
Sun Sep 28 11:03:23 EDT 2014


Johnny Billquist wrote:
> On 2014-09-28 01:26, David Froble wrote:
>> Jan-Erik Soderholm wrote:
>>
>>> I'm quite sure that regexp is *not* the answer for actual issues.
>>> It is not that complex, and you'd put the burdon on the user to
>>> come up with the actual regexp. That does not work, of course.
>>> I'm sure it will all end up in some simply test search
>>> sequantial over the data/file/table...
>>>
>>> Jan-Erik.
>>
>> It's already in production.
>>
>> I didn't check the code, but I'd guess it's something like:
>>
>> Assuming the data is in 3 arrays Mfg$(), Part$(), and Desc$()
>>
>> And the list is a RECORD variable array called List with elements Mfg$
>> and Part$
>>
>> And the search masks are Mask1$, Mask2$, and Mask3$
>>
>> ! Loop through the array
>>
>> For I% = 1% to MaxRec%
>>
>>      ! If a mfg code is specified, skip unless a match
>>
>>      Iterate if Mfg$ <> "" and Mfg$ <> MFG$(I%)
>>
>>      ! Search the description
>>
>>      Iterate    Unless    InStr( 1% , Desc$(I%) , Mask1$ ) &
>>          Or    Instr( 1% , Desc$(I%) , Mask2$ ) &
>>          Or    Instr( 1% , Desc$(I%) , Mask3$ )
>>
>>      ! Found one, add to list
>>
>>      J% = J% + 1%
>>      List(J%)::Mfg$ = Mfg$(I%)
>>      List(J%)::Part$ = Part$(I%)
>>
>> Next I%
>>
>> Now, I'd want to allow for a variable number of search masks, and
>> respect for "AND" and "OR" which would mean a bit more code, but the
>> example shows that if you got enough memory and CPU it's a rather simple
>> thing to do.
>>
>> Where I'd look for some efficiency is in checking the description for
>> multiple masks in one pass, but, I don't think that would be possible,
>> even if some tool made it seem it was doing it.  When you get down to
>> the actual machine instructions, I'm betting that it would do one mask
>> at a time.
> 
> I'm not sure I understand your explanation. What are the search masks? 
> The different words that a user entered, and you want them all to appear 
> in the description, but they are allowed to appear in any order?
> 
> If you were to place the masks in an array instead, you could accomplish 
> the check with a loop.
> 
> Using a regexp, you could do it all in one line, and the match can be 
> generated by the computer, given a list of masks that you are looking 
> for. The regexp will not be pretty, but honestly, who would care. It 
> would not be meant for looking at by a human.
> 
> Essentially, the mask would look something like this:
> (Let's say we have two mask words, "foo" and "bar")
> 
> (.*foo.*bar.*)|(.*bar.*foo.*)
> 
> Now, either that will match, in which case you have both words, or it 
> will not match, in which case both words do not appear.
> Expanding it to three words is not hard, but it grows fast:
> 
> (.*foo.*bar.*xxx.*)|(.*foo.*xxx.*bar.*)|(.*bar.*foo.*xxx.*)|(.*bar.*xxx.*foo.*)|(.*xxx.*foo.*bar.*]|(.*xxx.*bar.*foo.*) 
> 
> 
> Essentially, you just list all permutations of the words you have, with 
> the right characters in between. Creating this match string is fairly 
> trivial using a computer. All you feed in are the words you are looking 
> for, and out you get the string.
> 
>     Johnny
> 

That was a trivial example, not what we've actually done, and yes, lots 
could be done with "AND" and "OR".  The actual initial implementation is 
even simpler, with just one search mask (string).  We're putting in some 
code to count the number of times the "feature" actually gets used. 
Five times a day, or week, would indicate "good enough".

Upon further inquiry, I've been informed that the feature was not asked 
for by those selling parts.  It came from an equipment dealer that would 
rent or sell things like snow plows and such.  Thus the number of "hits" 
in a search is going to be much smaller than for say a "gasket".  Bill 
said a search for gasket returned over 3000 hits, which is rather useless.

So, in actual use, a customer might ask, "I need a snow plow for an MTD 
tractor", and the dealer can do a search and come up with "I've got a 36 
inch plow and a 48 inch plow".  From that perspective this request makes 
a whole bunch more sense.  I'm also guessing such a dealer has a far 
smaller product file.



More information about the Info-vax mailing list