Re: [PATCHv1bis 1/2] git apply: option to ignore whitespace differences

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 02 Jul 2009 16:55:55 -0700

By the way, I think we need to make sure your understanding of how the
current code works matches mine before you go any further.

Are the words "preimage", "postimage" and "target" used consistently
between us?  By these words, I mean:

  preimage = the lines prefixed with '-' and ' ' in the patch

  postimage = the lines prefixed with ' ' and '+' in the patch

  target = lines in the file being patched that corresponds to the preimage

The point of patch application is to find a block of lines in the target
that matches preimage, and replace that block with postimage.  When the
patch applies cleanly (which is the case we should optimize for), the
preimage match the target byte-for-byte.  The hunk starting at line 1690
does a memcmp of the whole thing, without ws fuzz, for this reason.  You
do not want to touch that part with your patch (and that is why I am
writing this message to make sure you understand what you are doing).

After that, as a fallback, we compare line-by-line, while fixing the
whitespace breakage in the preimage (what the patch author based on) and
the target (what we currently have).  The reason for the loop is because
we are interested in two cases:

 (1) The patch was made against an old code without recent whitespace fix
     we already have.

 (2) The patch was made against a code with whitespace fix we do not have
     yet.

In either case, preimage and target won't match byte-for-byte, but by
applying the whitespace breakage on each of the preimage line and the
corresponding target line, they will match in either of the above cases.
While doing this "convert-and-match", we prepare a version of preimage
with whitespace breakage fixed to give to update_pre_post_images() at the
end of the function in fixed_buf.

The contents of fixed_buf is used to update the preimage and the postimage
by calling update_pre_post_images().  This is to avoid reverting the
whitespace fix we already had in the target when we are in situation (1).
The postimage is what replaces the block of lines in the image that
matched the preimage, so this step is essential.

This is another point I am worried about your patch.  Suppose you have this
target:

    a a a
    b b b
    c c
    d
    e e

And we have a broken patch that needs --ignore-whitespace to apply:

    diff --git a/file b/file
    index xxxxxx..yyyyyy 100644
    @@ -1,4, +1,5 @@
     a  a  a
     b b  b
    +q
     c  c
       d

Your preimage is "a  a  a\nb b  b\nc  c\n  d\n",
target is        "a a a\nb b b\nc c\nd\ne e\n",
and postimage is "a  a  a\nb b  b\nq\nc  c\n  d\n".

Wouldn't you want to have this as the result of patch application?

    a a a
    b b b
    q
    c c
    d
    e e

With whitespace squashed, the preimage would match the target (perhaps
after fixing line_matches()), but wsfix_copy() called while we fix each
preimage line won't have changed anything in the fixed_buf that is to
become the new preimage, and update_pre_post_images() while copying the
fixed preimage to the postimage won't have corrected "a a a" back to "a a
a" that was in the target as the result.

So I suspect that you would instead end up with:

    a  a  a
    b b  b
    c  c
      d
    e e

I think the intent of --ignore-whitespace is "don't worry about ws
differences in the context when locating where to make the change", and it
is not "I do not care about getting whitespace mangled anywhere in the
file the patch touches."  correct_ws_error is special in that we can
afford to take the fixed pre/postimage, "because we are fixing the ws
breakage anyway", but arguably it _might_ be nicer to limit the change to
the lines marked with '-' and '+' in the patch even in that case.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html