[Info-vax] Looking for some text search ideas

VAXman- at SendSpamHere.ORG VAXman- at SendSpamHere.ORG
Sat Sep 27 07:23:11 EDT 2014


In article <m064e6$r81$1 at Iltempo.Update.UU.SE>, Johnny Billquist <bqt at softjar.se> writes:
>On 2014-09-27 03:17, David Froble wrote:
>> johnwallace4 at yahoo.co.uk wrote:
>>
>>> Me: Find a way to split the workload across multiple concurrent
>>> threads of execution, which then means it can be spread across
>>> multiple processors, thereby reducing the time to completion. 4
>>> processors => roughly a quarter of the single-processor elapsed time,
>>> though the actual effort required will increase slightly... maybe.
>>
>> I'm not sure I have a clue on how to do such a thing.
>>
>> Regardless, a small fraction of a second to perform the task, before any
>> optimization, is more than adequate.
>>
>> Frankly, I just don't see any way to optimize a search through data. The
>> brute force of just checking all of it sequentially seems the only
>> option.  But what do I know?  Could be, someone has come up with
>> something smart that could do a quicker job.  And so I throw the
>> question out to some smart people.
>>
>> :-)
>
>Hmm. People show a lack in CS education here. :-)
>If you want to search for a substring within a large gob of memory, then 
>yes, a brute force checking all characters sequentially is definitely 
>not the best we've come up with.
>This was figured out a long time ago.
>
>Essentially, you can do much better. Not that it matters for really 
>small gobs of memory, but if you're searching through a few megabytes, 
>then your search can be sped up by orders of magnitude if you are more 
>clever.
>
>I don't have time to write it down right now, but the basic idea is that 
>you search forward, but match backwards. And if you don't match, you can 
>figure out how much you can jump forward, which most of the time is more 
>than just one character.
>
>If people don't know how to do this, I can write it down in more detail 
>later.

It has already been writtien down in great detail for all to reference in:

  Knuth, Donald. E. 1973. The Art of Computer Programming, Vol. 3:
  Sorting and Searching , Addison-Wesley, Reading, Massachusetts

-- 
VAXman- A Bored Certified VMS Kernel Mode Hacker    VAXman(at)TMESIS(dot)ORG

I speak to machines with the voice of humanity.



More information about the Info-vax mailing list