[Info-vax] DCL DIFF algorithm

Tue Nov 9 09:24:47 EST 2010

"glen herrmannsfeldt" <gah at ugcs.caltech.edu> wrote in message 
news:ibave3$hfj$1 at news.eternal-september.org...
> DTL <didier.morandi at gmail.com> wrote:
>
>> Who could give me (discretely or not) the algorithm used by the DCL
>> Differences verb, please?
>> I'd like to code it in VBScript but I have no clue on how to start.
>> Compare lines one by one will break the logic as soon as the first
>> different lines are encountered. All next lines will obviously be
>> because of the shift. This is this shift concern I wish to be
>> documented on.
>
> The unix diff command uses the dynamic programming algorithms
> that originated for comparing protein sequences in biology.
>
> Having not thought about this for a while, I believe that they
> generate a hash value for each line, then run the dynamic
> programming algorithm on the hash values.  That finds the
> optimal set if insertions, substitutions, and deletions to
> go from on to the other.
>
> I don't know why DCL would do it differently.  The algorithms
> were well known.  The main one is the Needleman-Wunsch algorithm,
> first published in 1970.  (That is, some years before VMS 1.0.)
>

The DIF code is circa 1981, and is written in BLISS.  It's VMS-specific.  It 
knows how to ignore VMS-specific and language specific comments based on 
file (extension) type, and uses a window (a block of input lines from each 
file) to slide around looking to match lines and find differences.