Jeff King <peff@xxxxxxxx> wrote: > The diff-highlight script works on heuristics, so it can be > wrong. Let's document some of the wrong-ness in case > somebody feels like working on it. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > These were just some that I considered while looking at the output of > the original and the current code. Suggestions are welcome for more. I would also add (feel free to reword if my English is not very clear): 3. Sometimes the prefix or suffix is very small but because it exists, almost whole line is highlighted. For example: ---------------------------------------------- --{This is a very long line}. ++{I like apples}. ---------------------------------------------- or: ---------------------------------------------- -Th-{is is an apple} +Th+{at was a car} ---------------------------------------------- > > contrib/diff-highlight/README | 93 > +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 > insertions(+) > > diff --git a/contrib/diff-highlight/README > b/contrib/diff-highlight/README index 4a58579..502e03b 100644 > --- a/contrib/diff-highlight/README > +++ b/contrib/diff-highlight/README > @@ -57,3 +57,96 @@ following in your git configuration: > show = diff-highlight | less > diff = diff-highlight | less > --------------------------------------------- > + > +Bugs > +---- > + > +Because diff-highlight relies on heuristics to guess which parts of > +changes are important, there are some cases where the highlighting is > +more distracting than useful. Fortunately, these cases are rare in > +practice, and when they do occur, the worst case is simply a little > +extra highlighting. This section documents some cases known to be > +sub-optimal, in case somebody feels like working on improving the > +heuristics. > + > +1. Two changes on the same line get highlighted in a blob. For > example, > + highlighting: > + > +---------------------------------------------- > +-foo(buf, size); > ++foo(obj->buf, obj->size); > +---------------------------------------------- > + > + yields (where the inside of "+{}" would be highlighted): > + > +---------------------------------------------- > +-foo(buf, size); > ++foo(+{obj->buf, obj->}size); > +---------------------------------------------- > + > + whereas a more semantically meaningful output would be: > + > +---------------------------------------------- > +-foo(buf, size); > ++foo(+{obj->}buf, +{obj->}size); > +---------------------------------------------- > + > + Note that doing this right would probably involve a set of > + content-specific boundary patterns, similar to word-diff. > Otherwise > + you get junk like: > + > +----------------------------------------------------- > +-this line has some -{i}nt-{ere}sti-{ng} text on it > ++this line has some +{fa}nt+{a}sti+{c} text on it > +----------------------------------------------------- > + > + which is less readable than the current output. > + > +2. The multi-line matching assumes that lines in the pre- and > post-image > + match by position. This is often the case, but can be fooled when > a > + line is removed from the top and a new one added at the bottom (or > + vice versa). Unless the lines in the middle are also changed, > diffs > + will show this as two hunks, and it will not get highlighted at > all > + (which is good). But if the lines in the middle are changed, the > + highlighting can be misleading. Here's a pathological case: > + > +----------------------------------------------------- > +-one > +-two > +-three > +-four > ++two 2 > ++three 3 > ++four 4 > ++five 5 > +----------------------------------------------------- > + > + which gets highlighted as: > + > +----------------------------------------------------- > +-one > +-t-{wo} > +-three > +-f-{our} > ++two 2 > ++t+{hree 3} > ++four 4 > ++f+{ive 5} > +----------------------------------------------------- > + > + because it matches "two" to "three 3", and so forth. It would be > + nicer as: > + > +----------------------------------------------------- > +-one > +-two > +-three > +-four > ++two +{2} > ++three +{3} > ++four +{4} > ++five 5 > +----------------------------------------------------- > + > + which would probably involve pre-matching the lines into pairs > + according to some heuristic. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html