[Info-vax] Looking for some text search ideas
Bill Pedersen
pedersen at ccsscorp.com
Fri Sep 26 13:47:27 EDT 2014
On 9/26/2014 1:27 PM, David Froble wrote:
> Perhaps in place of discussing non-existent malware on the non-existent
> VMS on x86, I might solicit some ideas.
>
> Our applications are not using a RDBMS.
>
> A request has come up to be able to find any data which contains some
> specific text. An example might be any product description that
> contains the text "gasket". Using keys won't help, because the key
> might be "head gasket".
>
> This is similar I believe to the SQL request something like
>
> SELECT * from PRODUCT where DESCRIPTION %like% gasket
>
> My perspective is that on today's systems with gobs of memory that much
> of a database's information is probably in memory, thus not incurring
> the overhead of lots of disk seeks.
>
> It's also my perspective that such a search is a sequential pass through
> the data looking for matches.
>
> And so this is my question. Does anyone know of a more effective method
> than a sequential pass through the data of searching a list of data
> looking for text matches?
>
> I'm looking at possibilities from global data making the search
> available to all, to storage inside the one function currently needing
> this capability. Some more research into the application needs will
> determine the answer to this question.
The specifics of how you do your search and handle potential matches has
been researched over the years.
A recent example was to search only for the leading letter of the
string. On match then check the character at the position of the last
letter in desired srting for a match, if not match continue with
comparison search for first letter of string at position after the last
letter failed. This has been shown to speed up the searches.
It is not clear how much more you can do as far as improving search
performance but I am certain there are papers on this and other option.
Bill.
More information about the Info-vax
mailing list