[Info-vax] improve performance of /EXCLUDE

Tue Jan 27 18:45:27 EST 2015

On Wednesday, January 28, 2015 at 9:07:04 AM UTC+11, Stephen Hoffman wrote:
> On 2015-01-27 21:53:51 +0000, mcle*@***mail.com said:
> 
> > Isn't this mainly a factor of the respective file systems?  Linux/Unix 
> > is just a stream of bytes on disk whereas RMS is designed to provide a 
> > variety of commercial useful file structures (e.g. Indexed).
> 
> If you want to do a file-level search using traditional Unix or VMS 
> tools, then yes, you're going to be bound by the throughput of the 
> underlying file system and the associated storage.   As for a fast 
> search, those searches are calculated and cached ahead of time, and 
> vastly faster.  This means you might use a different tool -- mdfind, 
> rather than find, for instance -- but dinking around with a traditional 
> file-based search is not something most folks are interested in, once 
> they've used a cached, faster search tool.
> 
> > It's expecting a lot to believe that searching compressed data in an 
> > indexed file will be as fast as a simple pattern match in a stream of 
> > bytes.  Also a Linux/Unix search would have fewer overheads when 
> > determining the start and end of the record than VMS which has to use 
> > the specific record type and formatting information to try to figure 
> > out where the record starts and ends.
> 
> Again, I'd encourage having a look at how a more modern search tool 
> works -- the ht:/Dig search tool was ported and available on VMS for a 
> while, and was decently speedy.  As is typical with these search tools, 
> there's a metadata importer for the various formats, as this greatly 
> eases the effort of adding new file formats into the search index.  Got 
> some weird new format that VSI has never heard of, or want to allow 
> tailored searching of what would otherwise use a generic plug-in for 
> the specific file format?  Create a plug-in for the particular format.
> 
> 
> -- 
> Pure Personal Opinion | HoffmanLabs LLC

What difference do larger RMS buffers make to the search time?

I once wrote a 'copy' program (with an entry point to make it callable) that saw how much memory I had available (i.e. WSMAX less current allocation) and used a large chunk of that as the copy buffer.  I *think* it was faster than normal copy but it wasn't easy to check this because on the first run the file got cached on the SAN and read access fell quite a bit. A 'search' using a similar system might be very fast.