Re: [PATCH] Documentation/diff-options: explain different diff algorithms

Stefan Beller <sbeller@xxxxxxxxxx> · Thu, 9 Aug 2018 12:51:53 -0700

On Mon, Aug 6, 2018 at 4:18 PM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:

> > +DIFF ALGORITHMS
> > +---------------
>
> Please add some introductory words about what the headings refer to.

ok.

>
> > +the shortest output.
>
> Trivia: the `minimal` variant of Myers doesn't guarantee shortest
> output, either: what it minimizes is the number of lines marked as
> added or removed.  If you want to minimize context lines too, then
> that would be a new variant. ;-)

... and take line length into account. ;-)

It minimizes the edit distance in terms of lines, i.e. in a context-less diff
we get the lowest number of lines possible.

> > +This algorithm finds the longest common substring and recursively
> > +diffs the content before and after the longest common substring.
>
> optional: may be worth a short aside in the text about the distinction
> between LCS and LCS. ;-)
>
> It would be especially useful here, since the alphabet used in these
> strings is *lines* instead of characters, so the first-time reader
> could probably use some help in building their intuition.

That makes sense.

>
> > +This is often the fastest, but in corner cases (when there are
> > +many common substrings of the same length) it produces bad
>
> Can you clarify what "bad" means?  E.g. would "unexpected", or "poorly
> aligned", match what you mean?

I'll just go with unexpected.

> > +results as seen in:
> > +
> > +     seq 1 100 >one
> > +     echo 99 > two
> > +     seq 1 2 98 >>two
> > +     git diff --no-index --histogram one two