[Info-vax] DCL DIFF algorithm
FredK
fred.nospam at dec.com
Tue Nov 9 09:24:47 EST 2010
"glen herrmannsfeldt" <gah at ugcs.caltech.edu> wrote in message
news:ibave3$hfj$1 at news.eternal-september.org...
> DTL <didier.morandi at gmail.com> wrote:
>
>> Who could give me (discretely or not) the algorithm used by the DCL
>> Differences verb, please?
>> I'd like to code it in VBScript but I have no clue on how to start.
>> Compare lines one by one will break the logic as soon as the first
>> different lines are encountered. All next lines will obviously be
>> because of the shift. This is this shift concern I wish to be
>> documented on.
>
> The unix diff command uses the dynamic programming algorithms
> that originated for comparing protein sequences in biology.
>
> Having not thought about this for a while, I believe that they
> generate a hash value for each line, then run the dynamic
> programming algorithm on the hash values. That finds the
> optimal set if insertions, substitutions, and deletions to
> go from on to the other.
>
> I don't know why DCL would do it differently. The algorithms
> were well known. The main one is the Needleman-Wunsch algorithm,
> first published in 1970. (That is, some years before VMS 1.0.)
>
The DIF code is circa 1981, and is written in BLISS. It's VMS-specific. It
knows how to ignore VMS-specific and language specific comments based on
file (extension) type, and uses a window (a block of input lines from each
file) to slide around looking to match lines and find differences.
More information about the Info-vax
mailing list