[Info-vax] Looking for some text search ideas

VAXman- at SendSpamHere.ORG VAXman- at SendSpamHere.ORG
Sat Sep 27 16:49:46 EDT 2014


In article <m074bg$qtm$2 at Iltempo.Update.UU.SE>, Johnny Billquist <bqt at softjar.se> writes:
>On 2014-09-27 13:23, VAXman- @SendSpamHere.ORG wrote:
>> In article <m064e6$r81$1 at Iltempo.Update.UU.SE>, Johnny Billquist <bqt at softjar.se> writes:
>>> On 2014-09-27 03:17, David Froble wrote:
>>>> johnwallace4 at yahoo.co.uk wrote:
>>>>
>>>>> Me: Find a way to split the workload across multiple concurrent
>>>>> threads of execution, which then means it can be spread across
>>>>> multiple processors, thereby reducing the time to completion. 4
>>>>> processors => roughly a quarter of the single-processor elapsed time,
>>>>> though the actual effort required will increase slightly... maybe.
>>>>
>>>> I'm not sure I have a clue on how to do such a thing.
>>>>
>>>> Regardless, a small fraction of a second to perform the task, before any
>>>> optimization, is more than adequate.
>>>>
>>>> Frankly, I just don't see any way to optimize a search through data. The
>>>> brute force of just checking all of it sequentially seems the only
>>>> option.  But what do I know?  Could be, someone has come up with
>>>> something smart that could do a quicker job.  And so I throw the
>>>> question out to some smart people.
>>>>
>>>> :-)
>>>
>>> Hmm. People show a lack in CS education here. :-)
>>> If you want to search for a substring within a large gob of memory, then
>>> yes, a brute force checking all characters sequentially is definitely
>>> not the best we've come up with.
>>> This was figured out a long time ago.
>>>
>>> Essentially, you can do much better. Not that it matters for really
>>> small gobs of memory, but if you're searching through a few megabytes,
>>> then your search can be sped up by orders of magnitude if you are more
>>> clever.
>>>
>>> I don't have time to write it down right now, but the basic idea is that
>>> you search forward, but match backwards. And if you don't match, you can
>>> figure out how much you can jump forward, which most of the time is more
>>> than just one character.
>>>
>>> If people don't know how to do this, I can write it down in more detail
>>> later.
>>
>> It has already been writtien down in great detail for all to reference in:
>>
>>    Knuth, Donald. E. 1973. The Art of Computer Programming, Vol. 3:
>>    Sorting and Searching , Addison-Wesley, Reading, Massachusetts
>
>Thanks. Now I have an excuse to not try and write it all down here. :-)
>Anyone who wants to know, you have the authoritative reference right there.

You're welcome.  FWIW, I have the first three AoCP volumes and they sit on
a rack right behind my desk for quick reference should reference be needed.
I'd expect any good programmer to have at least one of the three in his or
her possession.

-- 
VAXman- A Bored Certified VMS Kernel Mode Hacker    VAXman(at)TMESIS(dot)ORG

I speak to machines with the voice of humanity.



More information about the Info-vax mailing list