Re: [PATCHv1bis 1/2] git apply: option to ignore whitespace differences

Giuseppe Bilotta <giuseppe.bilotta@xxxxxxxxx> · Fri, 3 Jul 2009 08:40:01 +0200

On Fri, Jul 3, 2009 at 1:55 AM, Junio C Hamano<gitster@xxxxxxxxx> wrote:
> By the way, I think we need to make sure your understanding of how the
> current code works matches mine before you go any further.

Souns reasonable.

> Are the words "preimage", "postimage" and "target" used consistently
> between us?  By these words, I mean:
>
>  preimage = the lines prefixed with '-' and ' ' in the patch
>
>  postimage = the lines prefixed with ' ' and '+' in the patch
>
>  target = lines in the file being patched that corresponds to the preimage

It _did_ take me a little to understand the names when I started
working on the feature, but I got on track pretty soon (at the first
segfault ;-)).

> The point of patch application is to find a block of lines in the target
> that matches preimage, and replace that block with postimage.  When the
> patch applies cleanly (which is the case we should optimize for), the
> preimage match the target byte-for-byte.  The hunk starting at line 1690
> does a memcmp of the whole thing, without ws fuzz, for this reason.  You
> do not want to touch that part with your patch (and that is why I am
> writing this message to make sure you understand what you are doing).

Of course.

> After that, as a fallback, we compare line-by-line, while fixing the
> whitespace breakage in the preimage (what the patch author based on) and
> the target (what we currently have).

> [...] preimage and target won't match byte-for-byte, but by
> applying the whitespace breakage on each of the preimage line and the
> corresponding target line, they will match in either of the above cases.
> While doing this "convert-and-match", we prepare a version of preimage
> with whitespace breakage fixed to give to update_pre_post_images() at the
> end of the function in fixed_buf.

Indeed. This is why in my 2/2 patch I do a similar operation to bring
the preimage whitespace to match the target whitespace if matching was
done ignoring whitespace (but we never got to that part, for obvious
reasons).

> This is another point I am worried about your patch.  Suppose you have this
> target:
>
>    a a a
>    b b b
>    c c
>    d
>    e e
>
> And we have a broken patch that needs --ignore-whitespace to apply:
>
>    diff --git a/file b/file
>    index xxxxxx..yyyyyy 100644
>    @@ -1,4, +1,5 @@
>     a  a  a
>     b b  b
>    +q
>     c  c
>       d
>
> Your preimage is "a  a  a\nb b  b\nc  c\n  d\n",
> target is        "a a a\nb b b\nc c\nd\ne e\n",
> and postimage is "a  a  a\nb b  b\nq\nc  c\n  d\n".
>
> Wouldn't you want to have this as the result of patch application?
>
>    a a a
>    b b b
>    q
>    c c
>    d
>    e e
>
> With whitespace squashed, the preimage would match the target (perhaps
> after fixing line_matches()), but wsfix_copy() called while we fix each
> preimage line won't have changed anything in the fixed_buf that is to
> become the new preimage, and update_pre_post_images() while copying the
> fixed preimage to the postimage won't have corrected "a a a" back to "a a
> a" that was in the target as the result.
>
> So I suspect that you would instead end up with:
>
>    a  a  a
>    b b  b
>    c  c
>      d
>    e e

This is indeed the case with my 1/2 patch: no whitespace adjustment is
done on the pre- and postimage when the preimage and target match with
whitespace fuzz and ignore_whitespace is active. In the first RFC I
sent I expressely mentioned that this was not what I liked about my
patch. When I first sent a _series_, it was made of two patches, the
second of which served the purpose of realigning the whitespaces of
the patch (pre and postimage) to the whitespaces of the target (at
least for the common lines).

> I think the intent of --ignore-whitespace is "don't worry about ws
> differences in the context when locating where to make the change", and it
> is not "I do not care about getting whitespace mangled anywhere in the
> file the patch touches."

I totally agree. This is important because it also means that when
re-diffing the applied patch you still get changes ONLY in the lines
where you SHOULD get changes, and not in the nearby context that only
had different whitespace.

I sent the thing in two patches to make it easier to review. If you
think it's more appropriate to squash them, I can do that no problem.

> correct_ws_error is special in that we can
> afford to take the fixed pre/postimage, "because we are fixing the ws
> breakage anyway", but arguably it _might_ be nicer to limit the change to
> the lines marked with '-' and '+' in the patch even in that case.

But that's a path we're not going to hit in match_fragment when
ignoring whitespace. Instead, one thing we could consider in this case
(ignore_whitespace) is to adjust the leading space in the + lines to
match the ws transformations done in the context lines, but that might
be making whitespace fixing a little too far. Or we should rename it
to whitespace=adjust...

-- 
Giuseppe "Oblomov" Bilotta
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html