Re: [RFC/PATCH] add-patch: handle splitting hunks with diff.suppressBlankEmpty

Jeff King <peff@xxxxxxxx> · Thu, 11 Jul 2024 17:26:28 -0400

On Wed, Jul 10, 2024 at 02:46:30PM +0100, Phillip Wood wrote:

> > It's tempting to say that we should just make sure that we generate a
> > diff without that option. But we may parse diffs not only generated by
> > Git, but ones that users have manually edited. And POSIX calls the
> > decision of whether to print the space here "implementation-defined".
> 
> Do we ever parse an edited hunk with this code? I'm not sure there is a
> way of splitting a hunk after it has been edited as I don't think we
> ever display it again.

Hmm, I just assumed this code would see the edited hunk, but now I'm not
sure. Note that the changes required do go outside of split_hunk(); the
initial parse_diff() needs to decide whether the hunk is splittable in
the first place (as an aside, that puzzled me at first why just changing
split_hunk() was enough for the case that started this thread, but not
the one in the included test. The difference is that without the empty
line as context, the hunk in the test wouldn't be splittable at all).

But looking closer: yes, I do think parse_diff() is used only for the
initial patch. So we really would only see git-generated patches here.
Which I think takes away my ambiguity concern, but does mean the commit
message is wrong.

> > I don't think we'd ever generate this ourselves, but could somebody
> > manually edit a hunk into this shape? When I tried it in practice, it
> > looks like we fail to apply the result even before my patch, though. I'm
> > not sure why that is. If I put "some garbage" without the blank line, we
> > correctly realize it should be discarded. It's possible I'm just holding
> > it wrong.
> 
> When we recount the hunk after it has been edited we ignore lines that
> don't begin with '+', '-', ' ', or '\n' so if you add some garbage at
> the end of the hunk the recounted hunk header excludes it when it gets
> applied.

OK, that makes sense. And we could never rely on the hunk header in the
edited hunk anyway, since the whole point is that we have to recount it.
So the user must accept that an extra blank line is potential context
(and that goes all the way back to bcdd297b78 (built-in add -p:
implement hunk editing, 2019-12-13).

> I think your patch looks good. I did wonder if we wanted to fix this
> by normalizing context lines instead as shown in the diff below. That
> might make it less likely to miss adding "|| '\n'" in future code that
> is looking for a context line but I don't have a strong preference
> either way.

Yeah, I had a similar thought, but it got messy because we have to deal
with the source buffer. But the extra "char ch" you added in the patch
below fixes that. I think the result is better.

Looking at the blank-line handling in recount_edited_hunk(), we also
handle a CRLF empty line there. Should we do so here, too? If so, then
it would just be a matter of touching normalize_marker() in your patch.

Do you want to just re-send your patch with a commit message to replace
mine? (Feel free to steal the non-wrong parts of my message ;) ).

> ---- >8 ----
> diff --git a/add-patch.c b/add-patch.c
> index d8ea05ff108..795aa772b7a 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk)
>  		hunk->splittable_into++;
>  }
> +/* Empty context lines may omit the leading ' ' */
> +static int normalize_marker(char marker)
> +{
> +	return marker == '\n' ? ' ' : marker;
> +}
> +
>  static int parse_diff(struct add_p_state *s, const struct pathspec *ps)

Minor nit: missing blank line between functions.

-Peff