RE: `git diff`/`git apply` can generate/apply ambiguous hunks (ie. in the wrong place) (just like gnu diff/patch)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday, July 3, 2024 11:25 AM, Emanuel Czirai wrote:
>This doesn't affect `git rebase` as it's way more robust than simply extracting the
>commits as patches and re-applying them. (I haven't looked into `git merge` though,
>but I doubt it's affected)
>
>It seems to me that no matter which algorithm `git diff` uses  (of the 4), it generates
>the same hunk, because it's really the context length that matters (which by default
>is 3), which is the same one that gnu `diff` generates, and its application via `git
>apply` is the same as gnu `patch`.
>
>side note:
>`diffy` is a simple rust-written library that behaves (at version 0.4.0) the same as
>normal diff and patch apply(with regards to generated diff contents and patch
>application; except it doesn't set the original/modified filenames in the patch), and
>since my limited experience, I've found it simpler to modify it to make it so that it
>generates unambiguous hunks, as a proof of concept that it can be done this way,
>here:
>https://github.com/bmwill/diffy/issues/31 , whereby increasing the context
>length(lines) of the whole patch(though ideally only of the affected hunks) the
>initially ambiguous hunk(s) cannot be applied anymore in more than 1 place inside
>the original file, thus avoiding both the diff creation and the patch application from
>generating and applying ambiguous hunks.
>
>But forgetting about that for a moment, I'm gonna show you about `git diff` and `git
>apply` below:
>1. clone cargo's repo:
>cd /tmp
>git clone https://github.com/rust-lang/cargo.git
>cd cargo
>2. checkout tag 0.76.0, or branch origin/rust-1.75.0 git checkout 0.76.0 3. manually
>edit this file ./src/cargo/core/workspace.rs at line 1118 or manually go to function:
>pub fn emit_warnings(&self) -> CargoResult<()> { right before that function's end
>which looks like:
>Ok(())
>}
>so there at line 1118, insert above that Ok(()) something, doesn't matter what,
>doesn't have to make sense, like:
>if seen_any_warnings {
>  //comment
>  bail!("reasons");
>}
>save the file
>4. try to generate a diff from it:
>git diff > /tmp/foo.patch
>you get this:
>```diff
>diff --git a/src/cargo/core/workspace.rs b/src/cargo/core/workspace.rs index
>4667c8029..8f0a74473 100644
>--- a/src/cargo/core/workspace.rs
>+++ b/src/cargo/core/workspace.rs
>@@ -1115,6 +1115,10 @@ impl<'cfg> Workspace<'cfg> {
>                 }
>             }
>         }
>+        if seen_any_warnings {
>+            //comment
>+            bail!("reasons");
>+        }
>         Ok(())
>     }
>
>```
>Now this is an ambiguous patch/hunk because if you try to apply this patch on the
>same original file, cumulatively, it applies successfully in 3 different places, but we
>won't do that now.
>5. now let's discard it(we already have it saved in /tmp/foo.patch) and pretend that
>something changed in the original code:
>git checkout .
>git checkout 0.80.0
>6. reapply that patch on the new changes:
>git apply /tmp/foo.patch
>(this shows no errors)
>7. look at the diff, for no good reason, just to see that it's the same(kind of):
>git diff > /tmp/foo2.patch
>contents:
>```diff
>diff --git a/src/cargo/core/workspace.rs b/src/cargo/core/workspace.rs index
>55bfb7a10..92709f224 100644
>--- a/src/cargo/core/workspace.rs
>+++ b/src/cargo/core/workspace.rs
>@@ -1099,6 +1099,10 @@ impl<'gctx> Workspace<'gctx> {
>                 }
>             }
>         }
>+        if seen_any_warnings {
>+            //comment
>+            bail!("reasons");
>+        }
>         Ok(())
>     }
>
>```
>8. now look at the file, where was the patch applied? that's right, it's at the end of
>the wrong function:
>fn validate_manifest(&mut self) -> CargoResult<()> { vim
>src/cargo/core/workspace.rs +1040 jump at the end of it at line 1107, you see it's
>our patch there, applied in the wrong spot!
>
>Hopefully, depending on the change, this kind of patch which applied in the wrong
>place, will be caught at (rust)compile time (in my case, it was this) or worse, at
>(binary)runtime.
>
>With the aforementioned `diffy` patch, the generated diff would actually be with a
>context of 4, to make it unambiguous, so it would've been this:
>```diff
>--- original
>+++ modified
>@@ -1186,8 +1186,12 @@
>                     self.gctx.shell().warn(msg)?
>                 }
>             }
>         }
>+        if seen_any_warnings {
>+            //use anyhow::bail;
>+            bail!("Compilation failed due to cargo warnings! Manually
>+ done
>this(via cargo patch) so that things like the following (ie. dep key packages= and
>using rust pre 1.26.0 which ignores it, downloads squatted
>package) will be avoided in the future:
>https://github.com/rust-lang/rust/security/advisories/GHSA-phjm-8x66-qw4r";);
>+        }
>         Ok(())
>     }
>
>     pub fn emit_lints(&self, pkg: &Package, path: &Path) -> CargoResult<()> { ``` this
>hunk is now unambiguous because it cannot be applied in more than 1 place in the
>original file, furthermore patching a modified future file will fail(with that `diffy`
>patch, and ideally, with `git apply` if any changes are implemented to "fix" this issue)
>if any of the hunks can be independently(and during the full patch application too)
>applied in more than 1 place.
>
>I consider that I don't know enough to understand how `git diff`/`git apply` works
>internally (and similarly, gnu `diff`/`patch`) to actually change them and make them
>generate unambiguous hunks where only the hunks that would've been ambiguous
>have increased context size, instead of the whole patch have increased context size
>for all hunks(which is what I did for `diffy` too so far, in that proof of concept patch),
>therefore if a "fix" is deemed necessary(it may not be, as I might've missed
>something and I'm unaware of it, so a fix may be messing other things up, who
>knows?!) then I hope someone much more knowledgeable could implement
>it(maybe even for gnu diff/patch too), and while I don't think that a "please" would
>be enough, I'm still gonna say it: please do so, if so inclined.
>
>Thank you for your time and consideration.

You make good points, but Rust code should not be put into the main git code base as it will break many non-GNU platforms. Perhaps rewriting it is C to be compatible with the git code-base.
--Randall






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux