[Info-vax] improve performance of /EXCLUDE
mcleanjoh at gmail.com
mcleanjoh at gmail.com
Tue Jan 27 18:45:27 EST 2015
On Wednesday, January 28, 2015 at 9:07:04 AM UTC+11, Stephen Hoffman wrote:
> On 2015-01-27 21:53:51 +0000, mcle*@***mail.com said:
>
> > Isn't this mainly a factor of the respective file systems? Linux/Unix
> > is just a stream of bytes on disk whereas RMS is designed to provide a
> > variety of commercial useful file structures (e.g. Indexed).
>
> If you want to do a file-level search using traditional Unix or VMS
> tools, then yes, you're going to be bound by the throughput of the
> underlying file system and the associated storage. As for a fast
> search, those searches are calculated and cached ahead of time, and
> vastly faster. This means you might use a different tool -- mdfind,
> rather than find, for instance -- but dinking around with a traditional
> file-based search is not something most folks are interested in, once
> they've used a cached, faster search tool.
>
> > It's expecting a lot to believe that searching compressed data in an
> > indexed file will be as fast as a simple pattern match in a stream of
> > bytes. Also a Linux/Unix search would have fewer overheads when
> > determining the start and end of the record than VMS which has to use
> > the specific record type and formatting information to try to figure
> > out where the record starts and ends.
>
> Again, I'd encourage having a look at how a more modern search tool
> works -- the ht:/Dig search tool was ported and available on VMS for a
> while, and was decently speedy. As is typical with these search tools,
> there's a metadata importer for the various formats, as this greatly
> eases the effort of adding new file formats into the search index. Got
> some weird new format that VSI has never heard of, or want to allow
> tailored searching of what would otherwise use a generic plug-in for
> the specific file format? Create a plug-in for the particular format.
>
>
> --
> Pure Personal Opinion | HoffmanLabs LLC
What difference do larger RMS buffers make to the search time?
I once wrote a 'copy' program (with an entry point to make it callable) that saw how much memory I had available (i.e. WSMAX less current allocation) and used a large chunk of that as the copy buffer. I *think* it was faster than normal copy but it wasn't easy to check this because on the first run the file got cached on the SAN and read access fell quite a bit. A 'search' using a similar system might be very fast.
More information about the Info-vax
mailing list