Re: [PATCH v2 4/5] Make boundary characters for --color-words configurable

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 07 May 2008 13:02:54 -0700

Jeff King <peff@xxxxxxxx> writes:

> But more big-picture, comparing the output of the old color words and
> this implementation, there is one thing I don't like: the new one
> doesn't bring together runs of additions and deletions, which can make
> parsing text much easier. For example:
>
>   $ echo This is a complete sentence. >one
>   $ echo Here is some totally different text. >two
>
>   # with old implementation; /-.../ is red, /+.../ is green
>   $ git diff --color-words one two
>   ...
>   /-This/ /+Here/ is /-a complete sentence./+some totally different text./
>
>   # with this patch
>   $ git diff --color-words one two
>   ...
>   /-This/+Here/ is /-a/+some/ /-complete/+totally/ /-sentence./+different text./

I suspect that heavily depends on the input text.  If you drop "different"
in the example, the output becomes:

    {-This|+Here} is {-a|+some} {-complete|+totally} {-sentence.|+text.}

which is totally sensible.

You can get the output that is closer to the original by tweaking the
definition of what a token is.  You can for example define a token as "0 or
more non whitespace characters followed by 1 or more whitespace characters"
and then the internal diff would become ($ to show the end of line):

    -This $
    +Here $
     is $
    -a $
    -complete $
    -sentence.$
    +some $
    +totally $
    +different $
    +text.$

which would yield on the output:

    {-This |+Here }is {-a complete sentence.|+some totally different text.}

It's all in diff_words_tokenize(), which I kept deliberately stupid so
that people can tweak it to their liking.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html