On Wed, Jul 3, 2024 at 11:01 PM Johannes Sixt <j6t@xxxxxxxx> wrote: > > Am 03.07.24 um 17:24 schrieb Emanuel Czirai: > > With the aforementioned `diffy` patch, the generated diff would actually be > > with a context of 4, to make it unambiguous, so it would've been this: > > ```diff > > --- original > > +++ modified > > @@ -1186,8 +1186,12 @@ > > self.gctx.shell().warn(msg)? > > } > > } > > } > > + if seen_any_warnings { > > + //use anyhow::bail; > > + bail!("reasons"); > > + } > > Ok(()) > > } > > > > pub fn emit_lints(&self, pkg: &Package, path: &Path) -> > > CargoResult<()> { > > ``` > > this hunk is now unambiguous because it cannot be applied in more than 1 > > place in the original file, > > This assertion is wrong, assuming that the patch is to be applied to a > modified version of 'original'. There is nothing that can be done at the > time when a patch is generated to make it unambiguous, not even if the > entire file becomes context. The reason is that the modified 'original' > could now have the part duplicated that is the context in the patch, and > it would be possible to apply the patch any one of the duplicates. Which > one? > > -- Hannes > tl;dr: you're correct indeed, a hunk is truly unambiguous only when `git apply` determines that, and it thus depends on the target file to apply to; `git diff` can generate unambiguous hunks only with respect to the unchanged original file, but applying that hunk to a changed original file only determines if a hunk is relatively unambiguous, that is, unambiguous with respect to that changed original file. Both diff-ing and patch application are needed to ensure unambiguity. That is correct, which is why it's necessary to have `git apply` also modified so that it tries to apply each hunk independently to the now-modified original to see if it can be applied in more than 1 place, and if so, fail with the proper error message; in addition, after applying it, it would try to apply it again, just in case applying it created a new spot for it to be reapplied, and if it succeeds to re-apply it, it would fail again. I kind of mentioned this I believe, if not, my bad. Both `git diff` and `git apply` would have to each ensure that the hunks are unambiguous for the hunks to be considered unambiguous, therefore you're correct in your statement that my stating that the hunk is unambiguous right after generation via a fixed `git diff` is wrong. It's merely unambiguous only in the context of applying it to the original file, but as soon as a new modified original file is presented, it can be ambiguous as it is, unless `git apply` can determine that indeed it cannot be applied in more than 1 place in this new modified original file, but both: applied independently(just to be sure) and during the application of all previous hunks until then, maybe even more than these 2, because the following hunks can create new spots for it, so a reapply could make that hunk succeed even if all others would fail. Not entirely sure here, how to properly ensure this, that's why someone more knowledgeable might. Ah, I see that I actually tried to say this(the above) after that statement but I used wrong wording "will fail" instead of "should fail" or "would fail" (if fixed, that is): > this hunk is now unambiguous because it cannot be applied in more than 1 > place in the original file, furthermore patching a modified future file > will fail(with that `diffy` patch, and ideally, with `git apply` if any > changes are implemented to "fix" this issue) if any of the hunks can be > independently(and during the full patch application too) applied in more > than 1 place. For what's worth, barring my lack of explaining, the `diffy` patch I've mentioned already does this: makes sure the generated hunks are unambiguous at diff generation, and unambiguous at the time of the application of the patch. But definitely both are required. I guess, I call the hunks unambiguous at diff generation but it implies they're only unambiguous on the original file, and because patching is also "fixed", patching can ensure the hunks cannot be applied in more than in 1 spot and I thus call them unambiguous at patching as well. But both are required (diffing+patching) to ensure that indeed a hunk is truly unambiguous, therefore, you'd be right to say that any static diff file even if it had the whole file as context cannot or shouldn't be called unambiguous, because that's only half of it; because applying that diff as patch(which is the second half of it) is necessary to see if it's unambiguous as it depends on the target that it's applied on. Thus even if the generated hunks(or patch) is unambiguous from the point of view of having generated it, applying it to a modified original file can cause the patching to fail if ambiguity is detected. Seems better to fail(either at diff generation, if context length is too high, or at patch application if ambiguity is detected) than to silently succeed and hope that a compilation will catch it, or worse a runtime behavior is eventually spotted to be wrong. Rust files appear to be far more prone to this ambiguous diff generation due to rustfmt which formats them like seen above with 1 brace per line, and having the possibility of consecutive such braces... Thanks for your reply. Definitely need to think more about calling hunks unambiguous, without also specifying that they're truly only so only when applied(which really makes them only as strong as _relatively_ unambiguous, since they depend on the target file to be applied to). Maybe call them semi-unambiguous, or unambiguous with respect to original file, unclear yet. Perhaps whoever decides to make a patch for this issue can call them differently, or just be aware of this. Cheers! Have a great day everyone!