[Info-vax] Looking for some text search ideas

Fri Sep 26 19:00:03 EDT 2014

On Friday, 26 September 2014 23:49:22 UTC+1, David Froble  wrote:
> Hein RMS van den Heuvel wrote:
> 
> > On Friday, September 26, 2014 1:27:04 PM UTC-4, David Froble wrote:
> 
> >>> Does anyone know of a more effective method 
> 
> > than a sequential pass through the data of searching a list of data 
> 
> > looking for text matches? 
> 
> > 
> 
> > Yes. Use 2 (or 10) passes each processing 1/2 (or 1/10) of the data.
> 
> > :-).
> 
> 
> 
> I don't see how this would help ??
> 
> 
> 
> > For an RMS sequential file, take 1/2 EOF. Read 8 (or so) blocks, 
> 
> > Start looking for a word on a word boundary which is smaller than
> 
> > LRL, and added to the curent offset points to a similar word. Use
> 
> > that as the stopper for the first stream, and use is for a
> 
> > $FIND-by-RFA to kick of the second stream.
> 
> 
> 
> No file access.  Data will be loaded into memory once, and then searched 
> 
> upon request.
> 
> 
> 
> > Was that SQl just as a matter of example, or did you indeed want to
> 
> > use SQL syntax, and columns as such,
> 
> 
> 
> That was just an example that I seemed to remember from the last time i 
> 
> used SQL.  I thought it explained the need rather well, if one knows 
> 
> SQL.  There will be no SQL in the implementation.
> 
> 
> 
> > For recent OpenVMS versions you can use SEARCH/KEY=(POS=n,SIZ=n) for
> 
> > column style search.
> 
> > 
> 
> > Attunity's Connect product can give you that SQL option, but is does
> 
> > no special processing... $GET in a loop, string-matches against the
> 
> > column data, linear.
> 
> > 
> 
> > You'll have to determine whether it is worth an (improvement) effort.
> 
> >  How long does it take, and how many resources used (CPU, IO) using
> 
> > some KISS method. Next take a stab at how long it is allowed to take
> 
> > and how many resource are available (memory, cpu). Also, figure out
> 
> > how often. Is it worth your while to load up a helper structure once
> 
> > for reuse?
> 
> 
> 
> It is definitely helpful to load up the data once and keep it available. 
> 
>   In a simple test, it took 5 seconds wall time to load the data.  It's 
> 
> taking .05 seconds for the in-memory search.
> 
> 
> 
> The application is an inventory inquiry service.  It waits for a socket 
> 
> connection request, gets the search information, looks up the specified 
> 
> inventory, and returns some information, on hand, and such.  Runs all 
> 
> day long, and can be heavily used.  Thus wanting to avoid going to disk 
> 
> for each search by partial description request.  Once the selected list 
> 
> is completed, then disk access is required to get up to date inventory 
> 
> availability.

Hein: Use 2 (or 10) passes each processing 1/2 (or 1/10) of the data. :-)

David: I don't see how this would help ??

Me: Find a way to split the workload across multiple concurrent threads of execution, which then means it can be spread across multiple processors, thereby reducing the time to completion. 4 processors => roughly a quarter of the single-processor elapsed time, though the actual effort required will increase slightly... maybe.

Either that, or you missed Hein's :-)