Re: `git diff`/`git apply` can generate/apply ambiguous hunks (ie. in the wrong place) (just like gnu diff/patch)

Emanuel Czirai <correabuscar+gitML@xxxxxxxxx> · Thu, 4 Jul 2024 09:15:35 +0200

On Wed, Jul 3, 2024 at 11:01 PM Johannes Sixt <j6t@xxxxxxxx> wrote:
>
> Am 03.07.24 um 17:24 schrieb Emanuel Czirai:
> > With the aforementioned `diffy` patch, the generated diff would actually be
> > with a context of 4, to make it unambiguous, so it would've been this:
> > ```diff
> > --- original
> > +++ modified
> > @@ -1186,8 +1186,12 @@
> >                      self.gctx.shell().warn(msg)?
> >                  }
> >              }
> >          }
> > +        if seen_any_warnings {
> > +            //use anyhow::bail;
> > +            bail!("reasons");
> > +        }
> >          Ok(())
> >      }
> >
> >      pub fn emit_lints(&self, pkg: &Package, path: &Path) ->
> > CargoResult<()> {
> > ```
> > this hunk is now unambiguous because it cannot be applied in more than 1
> > place in the original file,
>
> This assertion is wrong, assuming that the patch is to be applied to a
> modified version of 'original'. There is nothing that can be done at the
> time when a patch is generated to make it unambiguous, not even if the
> entire file becomes context. The reason is that the modified 'original'
> could now have the part duplicated that is the context in the patch, and
> it would be possible to apply the patch any one of the duplicates. Which
> one?
>
> -- Hannes
>

tl;dr: you're correct indeed, a hunk is truly unambiguous only when
`git apply` determines that, and it thus depends on the target file to
apply to; `git diff` can generate unambiguous hunks only with respect
to the unchanged original file, but applying that hunk to a changed
original file  only determines if a hunk is relatively unambiguous,
that is, unambiguous with respect to that changed original file. Both
diff-ing and patch application are needed to ensure unambiguity.

That is correct, which is why it's necessary to have `git apply` also
modified so that it tries to apply each hunk independently to the
now-modified original to see if it can be applied in more than 1
place, and if so, fail with the proper error message; in addition,
after applying it, it would try to apply it again, just in case
applying it created a new spot for it to be reapplied, and if it
succeeds to re-apply it, it would fail again. I kind of mentioned this
I believe, if not, my bad.
Both `git diff` and `git apply` would have to each ensure that the
hunks are unambiguous for the hunks to be considered unambiguous,
therefore you're correct in your statement that my stating that the
hunk is unambiguous right after generation via a fixed `git diff` is
wrong. It's merely unambiguous only in the context of applying it to
the original file, but as soon as a new modified original file is
presented, it can be ambiguous as it is, unless `git apply` can
determine that indeed it cannot be applied in more than 1 place in
this new modified original file, but both: applied independently(just
to be sure) and during the application of all previous hunks until
then, maybe even more than these 2, because the following hunks can
create new spots for it, so a reapply could make that hunk succeed
even if all others would fail. Not entirely sure here, how to properly
ensure this, that's why someone more knowledgeable might.

Ah, I see that I actually tried to say this(the above) after that
statement  but I used wrong wording "will fail" instead of "should
fail" or "would fail" (if fixed, that is):
> this hunk is now unambiguous because it cannot be applied in more than 1
> place in the original file, furthermore patching a modified future file
> will fail(with that `diffy` patch, and ideally, with `git apply` if any
> changes are implemented to "fix" this issue) if any of the hunks can be
> independently(and during the full patch application too) applied in more
> than 1 place.

For what's worth, barring my lack of explaining, the `diffy` patch
I've mentioned already does this: makes sure the generated hunks are
unambiguous at diff generation, and unambiguous at the time of the
application of the patch. But definitely both are required.

I guess, I call the hunks unambiguous at diff generation but it
implies they're only unambiguous on the original file, and because
patching is also "fixed", patching can ensure the hunks cannot be
applied in more than in 1 spot and I thus call them unambiguous at
patching as well. But both are required (diffing+patching) to ensure
that indeed a hunk is truly unambiguous, therefore, you'd be right to
say that any static diff file even if it had the whole file as context
cannot or shouldn't be called unambiguous, because that's only half of
it; because applying that diff as patch(which is the second half of
it) is necessary to see if it's unambiguous as it depends on the
target that it's applied on. Thus even if the generated hunks(or
patch) is unambiguous from the point of view of having generated it,
applying it to a modified original file can cause the patching to fail
if ambiguity is detected. Seems better to fail(either at diff
generation, if context length is too high, or at patch application if
ambiguity is detected) than to silently succeed and hope that a
compilation will catch it, or worse a runtime behavior is eventually
spotted to be wrong. Rust files appear to be far more prone to this
ambiguous diff generation due to rustfmt which formats them like seen
above with 1 brace per line, and having the possibility of consecutive
such braces...

Thanks for your reply. Definitely need to think more about calling
hunks unambiguous, without also specifying that they're truly only so
only when applied(which really makes them only as strong as
_relatively_ unambiguous, since they depend on the target file to be
applied to). Maybe call them semi-unambiguous, or unambiguous with
respect to original file, unclear yet. Perhaps whoever decides to make
a patch for this issue can call them differently, or just be aware of
this.

Cheers! Have a great day everyone!