Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty

Phillip Wood <phillip.wood123@xxxxxxxxx> · Wed, 10 Jul 2024 14:46:30 +0100

Hi Peff

On 10/07/2024 10:36, Jeff King wrote:
Subject: add-patch: handle splitting hunks with diff.suppressBlankEmpty

When "add -p" parses diffs, it looks for context lines starting with a
single space. But when diff.suppressBlankEmpty is in effect, an empty
context line will omit the space, giving us a true empty line. This
confuses the parser, which is unable to split based on such a line.

It's tempting to say that we should just make sure that we generate a
diff without that option. But we may parse diffs not only generated by
Git, but ones that users have manually edited. And POSIX calls the
decision of whether to print the space here "implementation-defined".

Do we ever parse an edited hunk with this code? I'm not sure there is a
way of splitting a hunk after it has been edited as I don't think we
ever display it again.

So let's handle both cases: a context line either starts with a space or
consists of a totally empty line.

Reported-by: Ilya Tumaykin <itumaykin@xxxxxxxxx>
Signed-off-by: Jeff King <peff@xxxxxxxx>
---
I'm a little worried that this creates ambiguities, since I don't think
we are careful about following the hunk header's line counts. Imagine
you had an input like this:

   @@ -1,2 +1,2 @@> 
   -old
   +new
    stuff

   some garbage

We obviously know that "some garbage" is not a context line and is just
trailing junk, because it does not begin with "-", "+" or space. But
what about the blank line in between? It looks like an empty context
line, but we can only know that it is not by respecting the counts in
the hunk header.

I don't think we'd ever generate this ourselves, but could somebody
manually edit a hunk into this shape? When I tried it in practice, it
looks like we fail to apply the result even before my patch, though. I'm> not sure why that is. If I put "some garbage" without the blank line, we
correctly realize it should be discarded. It's possible I'm just holding
it wrong.

When we recount the hunk after it has been edited we ignore lines that
don't begin with '+', '-', ' ', or '\n' so if you add some garbage at
the end of the hunk the recounted hunk header excludes it when it gets
applied.

I think your patch looks good. I did wonder if we wanted to fix this
by normalizing context lines instead as shown in the diff below. That
might make it less likely to miss adding "|| '\n'" in future code that
is looking for a context line but I don't have a strong preference
either way.

Best Wishes

Phillip

---- >8 ----

diff --git a/add-patch.c b/add-patch.c
index d8ea05ff108..795aa772b7a 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk)
 		hunk->splittable_into++;
 }
 
+/* Empty context lines may omit the leading ' ' */
+static int normalize_marker(char marker)
+{
+	return marker == '\n' ? ' ' : marker;
+}
+
 static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 {
 	struct strvec args = STRVEC_INIT;
@@ -485,6 +491,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 	while (p != pend) {
 		char *eol = memchr(p, '\n', pend - p);
 		const char *deleted = NULL, *mode_change = NULL;
+		char ch = normalize_marker(*p);
 
 		if (!eol)
 			eol = pend;
@@ -532,7 +539,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			 * Start counting into how many hunks this one can be
 			 * split
 			 */
-			marker = *p;
+			marker = ch;
 		} else if (hunk == &file_diff->head &&
 			   starts_with(p, "new file")) {
 			file_diff->added = 1;
@@ -586,10 +593,10 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
 			    (int)(eol - (plain->buf + file_diff->head.start)),
 			    plain->buf + file_diff->head.start);
 
-		if ((marker == '-' || marker == '+') && *p == ' ')
+		if ((marker == '-' || marker == '+') && ch == ' ')
 			hunk->splittable_into++;
-		if (marker && *p != '\\')
-			marker = *p;
+		if (marker && ch != '\\')
+			marker = ch;
 
 		p = eol == pend ? pend : eol + 1;
 		hunk->end = p - plain->buf;
@@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
 	context_line_count = 0;
 
 	while (splittable_into > 1) {
-		ch = s->plain.buf[current];
+		ch = normalize_marker(s->plain.buf[current]);
 
 		if (!ch)
 			BUG("buffer overrun while splitting hunks");
diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh
index 5d78868ac16..385c246baf0 100755
--- a/t/t3701-add-interactive.sh
+++ b/t/t3701-add-interactive.sh
@@ -1164,4 +1164,31 @@ test_expect_success 'reset -p with unmerged files' '
 	test_must_be_empty staged
 '
 
+test_expect_success 'hunk splitting works with diff.suppressBlankEmpty' '
+	cat >expect <<-\EOF &&
+	diff --git a/file b/file
+	index 777b174..ebc9684 100755
+	--- a/file
+	+++ b/file
+	@@ -2,6 +2,6 @@ p
+	 q
+	 r
+
+	-d
+	-e
+	-f
+	+s
+	+t
+	+u
+	EOF
+
+	test_config diff.suppressBlankEmpty true &&
+	test_write_lines a b c "" d e f >file &&
+	git add file &&
+	test_write_lines p q r "" s t u >file &&
+	test_write_lines s y n q | git add -p &&
+	git diff >actual &&
+	diff_cmp expect actual
+'
+
 test_done