[Info-vax] Looking for some text search ideas

David Froble davef at tsoft-inc.com
Fri Sep 26 18:49:22 EDT 2014


Hein RMS van den Heuvel wrote:
> On Friday, September 26, 2014 1:27:04 PM UTC-4, David Froble wrote:
>>> Does anyone know of a more effective method 
> than a sequential pass through the data of searching a list of data 
> looking for text matches? 
> 
> Yes. Use 2 (or 10) passes each processing 1/2 (or 1/10) of the data.
> :-).

I don't see how this would help ??

> For an RMS sequential file, take 1/2 EOF. Read 8 (or so) blocks, 
> Start looking for a word on a word boundary which is smaller than
> LRL, and added to the curent offset points to a similar word. Use
> that as the stopper for the first stream, and use is for a
> $FIND-by-RFA to kick of the second stream.

No file access.  Data will be loaded into memory once, and then searched 
upon request.

> Was that SQl just as a matter of example, or did you indeed want to
> use SQL syntax, and columns as such,

That was just an example that I seemed to remember from the last time i 
used SQL.  I thought it explained the need rather well, if one knows 
SQL.  There will be no SQL in the implementation.

> For recent OpenVMS versions you can use SEARCH/KEY=(POS=n,SIZ=n) for
> column style search.
> 
> Attunity's Connect product can give you that SQL option, but is does
> no special processing... $GET in a loop, string-matches against the
> column data, linear.
> 
> You'll have to determine whether it is worth an (improvement) effort.
>  How long does it take, and how many resources used (CPU, IO) using
> some KISS method. Next take a stab at how long it is allowed to take
> and how many resource are available (memory, cpu). Also, figure out
> how often. Is it worth your while to load up a helper structure once
> for reuse?

It is definitely helpful to load up the data once and keep it available. 
  In a simple test, it took 5 seconds wall time to load the data.  It's 
taking .05 seconds for the in-memory search.

The application is an inventory inquiry service.  It waits for a socket 
connection request, gets the search information, looks up the specified 
inventory, and returns some information, on hand, and such.  Runs all 
day long, and can be heavily used.  Thus wanting to avoid going to disk 
for each search by partial description request.  Once the selected list 
is completed, then disk access is required to get up to date inventory 
availability.



More information about the Info-vax mailing list