[Info-vax] Looking for some text search ideas
johnwallace4 at yahoo.co.uk
johnwallace4 at yahoo.co.uk
Fri Sep 26 19:00:03 EDT 2014
On Friday, 26 September 2014 23:49:22 UTC+1, David Froble wrote:
> Hein RMS van den Heuvel wrote:
>
> > On Friday, September 26, 2014 1:27:04 PM UTC-4, David Froble wrote:
>
> >>> Does anyone know of a more effective method
>
> > than a sequential pass through the data of searching a list of data
>
> > looking for text matches?
>
> >
>
> > Yes. Use 2 (or 10) passes each processing 1/2 (or 1/10) of the data.
>
> > :-).
>
>
>
> I don't see how this would help ??
>
>
>
> > For an RMS sequential file, take 1/2 EOF. Read 8 (or so) blocks,
>
> > Start looking for a word on a word boundary which is smaller than
>
> > LRL, and added to the curent offset points to a similar word. Use
>
> > that as the stopper for the first stream, and use is for a
>
> > $FIND-by-RFA to kick of the second stream.
>
>
>
> No file access. Data will be loaded into memory once, and then searched
>
> upon request.
>
>
>
> > Was that SQl just as a matter of example, or did you indeed want to
>
> > use SQL syntax, and columns as such,
>
>
>
> That was just an example that I seemed to remember from the last time i
>
> used SQL. I thought it explained the need rather well, if one knows
>
> SQL. There will be no SQL in the implementation.
>
>
>
> > For recent OpenVMS versions you can use SEARCH/KEY=(POS=n,SIZ=n) for
>
> > column style search.
>
> >
>
> > Attunity's Connect product can give you that SQL option, but is does
>
> > no special processing... $GET in a loop, string-matches against the
>
> > column data, linear.
>
> >
>
> > You'll have to determine whether it is worth an (improvement) effort.
>
> > How long does it take, and how many resources used (CPU, IO) using
>
> > some KISS method. Next take a stab at how long it is allowed to take
>
> > and how many resource are available (memory, cpu). Also, figure out
>
> > how often. Is it worth your while to load up a helper structure once
>
> > for reuse?
>
>
>
> It is definitely helpful to load up the data once and keep it available.
>
> In a simple test, it took 5 seconds wall time to load the data. It's
>
> taking .05 seconds for the in-memory search.
>
>
>
> The application is an inventory inquiry service. It waits for a socket
>
> connection request, gets the search information, looks up the specified
>
> inventory, and returns some information, on hand, and such. Runs all
>
> day long, and can be heavily used. Thus wanting to avoid going to disk
>
> for each search by partial description request. Once the selected list
>
> is completed, then disk access is required to get up to date inventory
>
> availability.
Hein: Use 2 (or 10) passes each processing 1/2 (or 1/10) of the data. :-)
David: I don't see how this would help ??
Me: Find a way to split the workload across multiple concurrent threads of execution, which then means it can be spread across multiple processors, thereby reducing the time to completion. 4 processors => roughly a quarter of the single-processor elapsed time, though the actual effort required will increase slightly... maybe.
Either that, or you missed Hein's :-)
More information about the Info-vax
mailing list