Re: `git diff`/`git apply` can generate/apply ambiguous hunks (ie. in the wrong place) (just like gnu diff/patch)

Emanuel Czirai <correabuscar+gitML@xxxxxxxxx> · Thu, 4 Jul 2024 12:24:00 +0200

I wasn't aware of this until now, but by searching the `bug-patch`
mailing list for the word "ambiguous" I've found only this result from
2011:
https://lists.gnu.org/archive/html/bug-patch/2011-08/msg00007.html
So the idea I've suggested isn't new(I'd guess anyone in this domain
would think of it as a potential solution).
I'll paste the relevant part:

On 08/16/2011 05:22 PM, Junio C Hamano wrote:

Given a patch that is not precise and can apply to multiple places,
"patch" and/or "git apply" can apply it to a place you may not have
intended. It may feel like a bug if that happens to a preimage that is
bit-for-bit identical to the version you prepared your patch is against,
but I would rather think, instead of blaming "patch" and/or "git apply",
it would be more productive to prepare a patch with larger context when
you know that the preimage file you are patching has many similar looking
lines, to make it _impossible_ for it to apply to places different from
what you intended.

Eric Blake<address@hidden>  writes:

Yes, I know that as well - the particular patch that sparked this
thread was ambiguous with three lines of context, but unambiguous with
6 lines, even when an offset still had to be applied. So maybe you
raise yet another feature proposal: What would it take for git to
generate unambiguous patches - that is, when generating a patch with
context, to then ensure that given the file it is being applied to,
then context is auto-widened until there are no other offsets where
the patch can possibly be applied? In other words, if I say 'git diff
HEAD^ --auto-context', then the resulting patch would have
automatically have 6 context lines for my problematic hunk, while
sticking to the default 3 context lines for other hunks that were
already unambiguous. Of course, this only protects you if starting
from the same version of the file (since any other patch can introduce
an ambiguity not present at the time you computed the minimal context
needed for non-ambiguity in your version of the pre-patch file).

And there are links to two issues for gnu diff/patch:
https://savannah.gnu.org/bugs/index.php?34031
https://savannah.gnu.org/bugs/index.php?34032

The first issue has a patch from 2023, but it's unclear to me if this
is at all useful for our issue which seems to be the same as the
second issue(which has no patch).

On git's mailing list (this one), there are over 7000 results for the
word "ambiguous", so I haven't even tried.

On Wed, Jul 3, 2024 at 5:21 PM Emanuel Czirai
<correabuscar+gitML@xxxxxxxxx> wrote:
>
> This doesn't affect `git rebase` as it's way more robust than simply extracting the commits as patches and re-applying them. (I haven't looked into `git merge` though, but I doubt it's affected)
>
> It seems to me that no matter which algorithm `git diff` uses  (of the 4), it generates the same hunk, because it's really the context length that matters (which by default is 3), which is the same one that gnu `diff` generates, and its application via `git apply` is the same as gnu `patch`.
>
> side note:
> `diffy` is a simple rust-written library that behaves (at version 0.4.0) the same as normal diff and patch apply(with regards to generated diff contents and patch application; except it doesn't set the original/modified filenames in the patch), and since my limited experience, I've found it simpler to modify it to make it so that it generates unambiguous hunks, as a proof of concept that it can be done this way, here: https://github.com/bmwill/diffy/issues/31 , whereby increasing the context length(lines) of the whole patch(though ideally only of the affected hunks) the initially ambiguous hunk(s) cannot be applied anymore in more than 1 place inside the original file, thus avoiding both the diff creation and the patch application from generating and applying ambiguous hunks.
>
> But forgetting about that for a moment, I'm gonna show you about `git diff` and `git apply` below:
> 1. clone cargo's repo:
> cd /tmp
> git clone https://github.com/rust-lang/cargo.git
> cd cargo
> 2. checkout tag 0.76.0, or branch origin/rust-1.75.0
> git checkout 0.76.0
> 3. manually edit this file ./src/cargo/core/workspace.rs at line 1118
> or manually go to function:
> pub fn emit_warnings(&self) -> CargoResult<()> {
> right before that function's end which looks like:
> Ok(())
> }
> so there at line 1118, insert above that Ok(()) something, doesn't matter what, doesn't have to make sense, like:
> if seen_any_warnings {
>   //comment
>   bail!("reasons");
> }
> save the file
> 4. try to generate a diff from it:
> git diff > /tmp/foo.patch
> you get this:
> ```diff
> diff --git a/src/cargo/core/workspace.rs b/src/cargo/core/workspace.rs
> index 4667c8029..8f0a74473 100644
> --- a/src/cargo/core/workspace.rs
> +++ b/src/cargo/core/workspace.rs
> @@ -1115,6 +1115,10 @@ impl<'cfg> Workspace<'cfg> {
>                  }
>              }
>          }
> +        if seen_any_warnings {
> +            //comment
> +            bail!("reasons");
> +        }
>          Ok(())
>      }
>
> ```
> Now this is an ambiguous patch/hunk because if you try to apply this patch on the same original file, cumulatively, it applies successfully in 3 different places, but we won't do that now.
> 5. now let's discard it(we already have it saved in /tmp/foo.patch) and pretend that something changed in the original code:
> git checkout .
> git checkout 0.80.0
> 6. reapply that patch on the new changes:
> git apply /tmp/foo.patch
> (this shows no errors)
> 7. look at the diff, for no good reason, just to see that it's the same(kind of):
> git diff > /tmp/foo2.patch
> contents:
> ```diff
> diff --git a/src/cargo/core/workspace.rs b/src/cargo/core/workspace.rs
> index 55bfb7a10..92709f224 100644
> --- a/src/cargo/core/workspace.rs
> +++ b/src/cargo/core/workspace.rs
> @@ -1099,6 +1099,10 @@ impl<'gctx> Workspace<'gctx> {
>                  }
>              }
>          }
> +        if seen_any_warnings {
> +            //comment
> +            bail!("reasons");
> +        }
>          Ok(())
>      }
>
> ```
> 8. now look at the file, where was the patch applied? that's right, it's at the end of the wrong function:
> fn validate_manifest(&mut self) -> CargoResult<()> {
> vim src/cargo/core/workspace.rs +1040
> jump at the end of it at line 1107, you see it's our patch there, applied in the wrong spot!
>
> Hopefully, depending on the change, this kind of patch which applied in the wrong place, will be caught at (rust)compile time (in my case, it was this) or worse, at (binary)runtime.
>
> With the aforementioned `diffy` patch, the generated diff would actually be with a context of 4, to make it unambiguous, so it would've been this:
> ```diff
> --- original
> +++ modified
> @@ -1186,8 +1186,12 @@
>                      self.gctx.shell().warn(msg)?
>                  }
>              }
>          }
> +        if seen_any_warnings {
> +            //use anyhow::bail;
> +            bail!("Compilation failed due to cargo warnings! Manually done this(via cargo patch) so that things like the following (ie. dep key packages= and using rust pre 1.26.0 which ignores it, downloads squatted package) will be avoided in the future: https://github.com/rust-lang/rust/security/advisories/GHSA-phjm-8x66-qw4r";);
> +        }
>          Ok(())
>      }
>
>      pub fn emit_lints(&self, pkg: &Package, path: &Path) -> CargoResult<()> {
> ```
> this hunk is now unambiguous because it cannot be applied in more than 1 place in the original file, furthermore patching a modified future file will fail(with that `diffy` patch, and ideally, with `git apply` if any changes are implemented to "fix" this issue) if any of the hunks can be independently(and during the full patch application too) applied in more than 1 place.
>
> I consider that I don't know enough to understand how `git diff`/`git apply` works internally (and similarly, gnu `diff`/`patch`) to actually change them and make them generate unambiguous hunks where only the hunks that would've been ambiguous have increased context size, instead of the whole patch have increased context size for all hunks(which is what I did for `diffy` too so far, in that proof of concept patch), therefore if a "fix" is deemed necessary(it may not be, as I might've missed something and I'm unaware of it, so a fix may be messing other things up, who knows?!) then I hope someone much more knowledgeable could implement it(maybe even for gnu diff/patch too), and while I don't think that a "please" would be enough, I'm still gonna say it: please do so, if so inclined.
>
> Thank you for your time and consideration.
>
>