[Info-vax] Looking for some text search ideas
Jan-Erik Soderholm
jan-erik.soderholm at telia.com
Sat Sep 27 06:23:24 EDT 2014
David Froble wrote 2014-09-27 03:34:
> Craig A. Berry wrote:
>> On 9/26/14, 12:27 PM, David Froble wrote:
>>
>>> Our applications are not using a RDBMS.
>>
>>> A request has come up to be able to find any data which contains some
>>> specific text. An example might be any product description that
>>> contains the text "gasket". Using keys won't help, because the key
>>> might be "head gasket".
>>
>> I assume from the talk of keys that these are RMS indexed files? Does
>> either the target of a search or the unit to be returned when you find
>> something ever span record boundaries, e.g.:
>>
>> XYZ001This is a news-
>> XYZ002worthy message.
>>
>> If I search for "newsworthy" should I consider that group of records a
>> match and return both records? Should I be able to match a word broken
>> across record boundaries? Are all the searches "word" searches with
>> clearly defined delimiters and known character sets? Or if I search for
>> "sage" should I match "message"?
>
> Nothing so complex.
>
>> I see from a subsequent post that you are just doing INSTR on arrays of
>> strings. If that works for you, that's fine. A good regular expression
>> engine would run circles around INSTR in both functionality and
>> performance. A full text search engine would too, and if the data are
>> simple, you could build your own with only moderate trouble that indexed
>> words (or characters if you wish) and saved either unique key values or
>> RFAs to get from the search string back to the containing record(s).
>>
>>
>
> Not RMS, but similar. The product file is just records with 50-60 data
> fields. Primary key is Mfg code + Part #. Part description is not keyed.
> No good reason to do so. Briggs may call a part "Gasket, head" while
> Kohler may call a similar part "Head gasket". Not worth trying for any
> type of keying. Thus my conjecture that a brute force pass through all the
> descriptions is about that can be done.
>
> But now you mention a "regular expression engine". Never heard of such.
http://en.wikipedia.org/wiki/Regular_expression :
In theoretical computer science and formal language theory, a regular
expression (abbreviated regex or regexp) and sometimes called a rational
expression[1][2] is a sequence of characters that forms a search pattern,
mainly for use in pattern matching with strings, or string matching, i.e.
"find and replace"-like operations. The concept arose in the 1950s, when
the American mathematician Stephen Kleene formalized the description of a
regular language, and came into common use with the Unix text processing
utilities ed, an editor, and grep (global regular expression print), a
filter.
I was a bit surpriced you didn't know about regexp.
> Guess I need to look up the term to see what it's about. Maybe time for
> this old dog to learn something new.
More information about the Info-vax
mailing list