Re: [PATCH 5/5] diff-highlight: document some non-optimal cases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> wrote:

> The diff-highlight script works on heuristics, so it can be
> wrong. Let's document some of the wrong-ness in case
> somebody feels like working on it.
> 
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
> These were just some that I considered while looking at the output of
> the original and the current code. Suggestions are welcome for more.

I would also add (feel free to reword if my English is not very clear):

3. Sometimes the prefix or suffix is very small but because it exists,
almost whole line is highlighted. For example:

----------------------------------------------
--{This is a very long line}.
++{I like apples}.
----------------------------------------------

or:

----------------------------------------------
-Th-{is is an apple}
+Th+{at was a car}
----------------------------------------------


> 
>  contrib/diff-highlight/README |   93
> +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93
> insertions(+)
> 
> diff --git a/contrib/diff-highlight/README
> b/contrib/diff-highlight/README index 4a58579..502e03b 100644
> --- a/contrib/diff-highlight/README
> +++ b/contrib/diff-highlight/README
> @@ -57,3 +57,96 @@ following in your git configuration:
>  	show = diff-highlight | less
>  	diff = diff-highlight | less
>  ---------------------------------------------
> +
> +Bugs
> +----
> +
> +Because diff-highlight relies on heuristics to guess which parts of
> +changes are important, there are some cases where the highlighting is
> +more distracting than useful. Fortunately, these cases are rare in
> +practice, and when they do occur, the worst case is simply a little
> +extra highlighting. This section documents some cases known to be
> +sub-optimal, in case somebody feels like working on improving the
> +heuristics.
> +
> +1. Two changes on the same line get highlighted in a blob. For
> example,
> +   highlighting:
> +
> +----------------------------------------------
> +-foo(buf, size);
> ++foo(obj->buf, obj->size);
> +----------------------------------------------
> +
> +   yields (where the inside of "+{}" would be highlighted):
> +
> +----------------------------------------------
> +-foo(buf, size);
> ++foo(+{obj->buf, obj->}size);
> +----------------------------------------------
> +
> +   whereas a more semantically meaningful output would be:
> +
> +----------------------------------------------
> +-foo(buf, size);
> ++foo(+{obj->}buf, +{obj->}size);
> +----------------------------------------------
> +
> +   Note that doing this right would probably involve a set of
> +   content-specific boundary patterns, similar to word-diff.
> Otherwise
> +   you get junk like:
> +
> +-----------------------------------------------------
> +-this line has some -{i}nt-{ere}sti-{ng} text on it
> ++this line has some +{fa}nt+{a}sti+{c} text on it
> +-----------------------------------------------------
> +
> +   which is less readable than the current output.
> +
> +2. The multi-line matching assumes that lines in the pre- and
> post-image
> +   match by position. This is often the case, but can be fooled when
> a
> +   line is removed from the top and a new one added at the bottom (or
> +   vice versa). Unless the lines in the middle are also changed,
> diffs
> +   will show this as two hunks, and it will not get highlighted at
> all
> +   (which is good). But if the lines in the middle are changed, the
> +   highlighting can be misleading. Here's a pathological case:
> +
> +-----------------------------------------------------
> +-one
> +-two
> +-three
> +-four
> ++two 2
> ++three 3
> ++four 4
> ++five 5
> +-----------------------------------------------------
> +
> +   which gets highlighted as:
> +
> +-----------------------------------------------------
> +-one
> +-t-{wo}
> +-three
> +-f-{our}
> ++two 2
> ++t+{hree 3}
> ++four 4
> ++f+{ive 5}
> +-----------------------------------------------------
> +
> +   because it matches "two" to "three 3", and so forth. It would be
> +   nicer as:
> +
> +-----------------------------------------------------
> +-one
> +-two
> +-three
> +-four
> ++two +{2}
> ++three +{3}
> ++four +{4}
> ++five 5
> +-----------------------------------------------------
> +
> +   which would probably involve pre-matching the lines into pairs
> +   according to some heuristic.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]