[Info-vax] Looking for some text search ideas
Johnny Billquist
bqt at softjar.se
Sat Sep 27 15:49:05 EDT 2014
On 2014-09-27 13:23, VAXman- @SendSpamHere.ORG wrote:
> In article <m064e6$r81$1 at Iltempo.Update.UU.SE>, Johnny Billquist <bqt at softjar.se> writes:
>> On 2014-09-27 03:17, David Froble wrote:
>>> johnwallace4 at yahoo.co.uk wrote:
>>>
>>>> Me: Find a way to split the workload across multiple concurrent
>>>> threads of execution, which then means it can be spread across
>>>> multiple processors, thereby reducing the time to completion. 4
>>>> processors => roughly a quarter of the single-processor elapsed time,
>>>> though the actual effort required will increase slightly... maybe.
>>>
>>> I'm not sure I have a clue on how to do such a thing.
>>>
>>> Regardless, a small fraction of a second to perform the task, before any
>>> optimization, is more than adequate.
>>>
>>> Frankly, I just don't see any way to optimize a search through data. The
>>> brute force of just checking all of it sequentially seems the only
>>> option. But what do I know? Could be, someone has come up with
>>> something smart that could do a quicker job. And so I throw the
>>> question out to some smart people.
>>>
>>> :-)
>>
>> Hmm. People show a lack in CS education here. :-)
>> If you want to search for a substring within a large gob of memory, then
>> yes, a brute force checking all characters sequentially is definitely
>> not the best we've come up with.
>> This was figured out a long time ago.
>>
>> Essentially, you can do much better. Not that it matters for really
>> small gobs of memory, but if you're searching through a few megabytes,
>> then your search can be sped up by orders of magnitude if you are more
>> clever.
>>
>> I don't have time to write it down right now, but the basic idea is that
>> you search forward, but match backwards. And if you don't match, you can
>> figure out how much you can jump forward, which most of the time is more
>> than just one character.
>>
>> If people don't know how to do this, I can write it down in more detail
>> later.
>
> It has already been writtien down in great detail for all to reference in:
>
> Knuth, Donald. E. 1973. The Art of Computer Programming, Vol. 3:
> Sorting and Searching , Addison-Wesley, Reading, Massachusetts
Thanks. Now I have an excuse to not try and write it all down here. :-)
Anyone who wants to know, you have the authoritative reference right there.
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt at softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
More information about the Info-vax
mailing list