Re: [PATCH v2 0/2] builtin add -p: fix hunk splitting

Phillip Wood <phillip.wood123@xxxxxxxxx> · Mon, 24 Jan 2022 11:10:47 +0000




Hi Dscho

On 22/01/2022 09:05, Johannes Schindelin wrote:
Hi Phillip,

first of all: thank you for these patches. I read over them and they have
my ACK.

Thanks

On Wed, 19 Jan 2022, Phillip Wood wrote:

On 12/01/2022 18:51, Junio C Hamano wrote:
Phillip Wood <phillip.wood123@xxxxxxxxx> writes:

I'm not sure I want to go with your extra changes. I've left some
comments on them below

@@ -488,12 +499,12 @@ static int parse_diff(struct add_p_state *s, const
struct pathspec *ps)
    else if (starts_with(p, "@@ ") ||
      (hunk == &file_diff->head &&
   		  (skip_prefix(p, "deleted file", &deleted)))) {
-		if (marker == '-' || marker == '+')
-			/*
-			 * Should not happen; previous hunk did not end
-			 * in a context line? Handle it anyway.
-			 */
+			hunk->splittable_into++;
+		/*
+		 * Should not increment "splittable_into";
+		 * previous hunk did not end in a context
+		 * line? Handle it anyway.
+		 */
+		complete_file(marker, &hunk->splittable_into);
      ALLOC_GROW_BY(file_diff->hunk, file_diff->hunk_nr, 1,
         file_diff->hunk_alloc);

I deliberately left this alone as I think we should probably make
this BUG() out instead of silently accepting an invalid diff.

FWIW this was overzealous defensive programming on my part. More on that
below.

As we are reading our own output, I agree that such a data error is
a BUG().

Indeed. I was less worried about the output format changing, and more
concerned with bugs in my parser ;-)

Although, having said that, I had meant to verify that `git add -p` cannot
be asked to produce and consume diffs with `-U0` when I wrote that
comment. Now I did that, and I am now confident that there is no way to ask
`git add -p` to generate and use context line-free diffs: we neither add
`-U<n>` in https://github.com/git/git/blob/v2.34.1/add-patch.c#L398-L417
nor do we call the user-facing `git diff` command that would interpret
`diff.context`, but instead we use `git diff-index` and `git diff-files`
(which ignore that config setting).

I did think about zero context diffs but realized that they can never be 
split so we don't need to worry about incrementing hunk->splittable_into 
in that case. It does mean that hunk->splittable_into will be zero in 
the -U0 case rather than one but I dont think that matters as we only 
care if it is >2 for splitting.

Best Wishes

Phillip

In any case, a helper to see if the file ended without post-context
is one thing, and a helper that specify what happens after we are
done with a single file, before we move on top the next file or
after processing the last file, is another thing.  The latter may be
able to make use of the former, but the latter may want to do more
than that in the future.

If you are concerned about the name of the function: maybe a better name
would be `maybe_increment_splittable_hunk_count(marker)`.


As complete_file() is about finalizing the processing we have done
to the current file, it should be used for that purpose, and nothing
else, I think the hunk I see at
https://github.com/git/git/commit/c082176f8c5a1fc1c8b2a93991ca28fd63aae73a
(reproduced below) is simply a nonsense.

Stepping back a bit, though, is this helper really finalizing the
current file, or is it finalizing the current hunk?  If it were the
latter, then its use in the hunk I called "nonsense" above actually
makes perfect sense.

Even if the helper is finalizing the current hunk then I think that "nonsense"
hunk would still wrong as it would be calling finalize_hunk() on _every_
context line in the hunk rather than just being called once to finalize the
hunk. We could call the function something like update_splittable() but then
we'd need to explain why we were calling that function at the start of a diff
and at the end of the loop.

Right. The point of this check is to see whether we missed counting a
splittable hunk. Then it makes more sense to call it at the beginning of a
file, at the end of a file _and_ at a context line.

Having said all that, I am really fine with what landed in `next`.

Thank you,
Dscho