Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty

Phillip Wood <phillip.wood123@xxxxxxxxx> · Thu, 18 Jul 2024 15:23:10 +0100

On 11/07/2024 22:26, Jeff King wrote:
On Wed, Jul 10, 2024 at 02:46:30PM +0100, Phillip Wood wrote:

It's tempting to say that we should just make sure that we generate a
diff without that option. But we may parse diffs not only generated by
Git, but ones that users have manually edited. And POSIX calls the
decision of whether to print the space here "implementation-defined".

Do we ever parse an edited hunk with this code? I'm not sure there is a
way of splitting a hunk after it has been edited as I don't think we
ever display it again.

Hmm, I just assumed this code would see the edited hunk, but now I'm not
sure. Note that the changes required do go outside of split_hunk(); the
initial parse_diff() needs to decide whether the hunk is splittable in
the first place (as an aside, that puzzled me at first why just changing
split_hunk() was enough for the case that started this thread, but not
the one in the included test. The difference is that without the empty
line as context, the hunk in the test wouldn't be splittable at all).

But looking closer: yes, I do think parse_diff() is used only for the
initial patch.

That's true but I've realized that we do in fact allow the user to 
re-display and split an edited hunk. If there are two changes in a file 
then you can edit the first hunk and when prompted about the second 
press 'K' to go back to the first one and then split it with 's' (I 
messed up my test for this before sending my previous mail as I changed 
two separate files rather than putting two changes in a single file). So 
split_hunk() can encounter empty context lines even without 
diff.suppressBlankEmpty being set as lots of editors strip the leading 
space when the rest of the line is empty[1].

As we haven't had any bug reports about that I suspect people are not 
splitting the hunks they edited which I guess makes sense. I think there 
is another bug lurking for anyone who does try to split an edited hunk 
as we don't update `hunk->splittable_into` after it has been edited and 
the edit might change a deleted of an empty line to a context line or 
vice versa. We should make sure any garbage at the end of the hunk is 
discarded as well so we don't show it when we display the edited hunk.

Best Wishes

Phillip

[1] When I added the code to recount the hunk header rather than relying 
on "git apply --recount" initially it did not support empty context 
lines and we received quite a few bug reports pretty quickly after it 
was released.

So we really would only see git-generated patches here.
Which I think takes away my ambiguity concern, but does mean the commit
message is wrong.

I don't think we'd ever generate this ourselves, but could somebody
manually edit a hunk into this shape? When I tried it in practice, it
looks like we fail to apply the result even before my patch, though. I'm
not sure why that is. If I put "some garbage" without the blank line, we
correctly realize it should be discarded. It's possible I'm just holding
it wrong.

When we recount the hunk after it has been edited we ignore lines that
don't begin with '+', '-', ' ', or '\n' so if you add some garbage at
the end of the hunk the recounted hunk header excludes it when it gets
applied.

OK, that makes sense. And we could never rely on the hunk header in the
edited hunk anyway, since the whole point is that we have to recount it.
So the user must accept that an extra blank line is potential context
(and that goes all the way back to bcdd297b78 (built-in add -p:
implement hunk editing, 2019-12-13).

I think your patch looks good. I did wonder if we wanted to fix this
by normalizing context lines instead as shown in the diff below. That
might make it less likely to miss adding "|| '\n'" in future code that
is looking for a context line but I don't have a strong preference
either way.

Yeah, I had a similar thought, but it got messy because we have to deal
with the source buffer. But the extra "char ch" you added in the patch
below fixes that. I think the result is better.

Looking at the blank-line handling in recount_edited_hunk(), we also
handle a CRLF empty line there. Should we do so here, too? If so, then
it would just be a matter of touching normalize_marker() in your patch.

Do you want to just re-send your patch with a commit message to replace
mine? (Feel free to steal the non-wrong parts of my message ;) ).

---- >8 ----

diff --git a/add-patch.c b/add-patch.c
index d8ea05ff108..795aa772b7a 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk)
  		hunk->splittable_into++;
  }
+/* Empty context lines may omit the leading ' ' */
+static int normalize_marker(char marker)
+{
+	return marker == '\n' ? ' ' : marker;
+}
+
  static int parse_diff(struct add_p_state *s, const struct pathspec *ps)

Minor nit: missing blank line between functions.

-Peff