Re: git add -p breaks after split on change at the top of the file

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 20 Aug 2017 09:25:42 -0700

Jeff King <peff@xxxxxxxx> writes:

> So that's one question I'm puzzled by: why does it work without an edit,
> but fail with one.

As this is the part of the system I long time ago declared a "works
only some of the time" hack, I do not recall the exact details, but
a key phrase "DIRTY" in the old discussion thread made me look.

Doesn't coalesce_overlapping_hunks in add-interactive.perl play a
role in the difference?  I _think_ that one tries to do the right
thing by enabling the logic before the "(e)dit" hack was introduced;
I haven't actually checked out and looked the code before the
addition of the option, but "add-p" should back then have actually
understood the offsets and hunk length and merged the surviving
hunks as needed to avoid feeding overlaps to "git apply"---it may
even have fixed the offset before doing so.  Edited hunks whose
mechanical integrity has become dubious are marked with DIRTY, as
the hunk touched by "(e)dit" codepath, by the time it reaches that
helper function, no longer has trustable information in the correct
offset information there.

> My second question is how "add -p" could generate the corrected hunk
> headers.
>
> In the non-edit case, we know that our second hunk's leading context
> must always match the first hunk's trailing context (since that's how we
> do a split). So we'd know that the preimage "-pos" field of the second
> hunk header should be offset by the net sum of the added/changed lines
> for any previous split hunks.
>
> In the original case we had net two added lines in the first hunk. So if
> it's selected, that's how our "-1" becomes "-3". And after the edit, we
> should be able to apply the same, I think? We have a single added line,
> so it becomes "-2". That doesn't require knowing what was in the hunk
> before the user edited at all. We just need to know which other hunks
> it's split from, so we know which overlapping hunks we need to take into
> account when recomputing.
>
> "add -p" certainly has that information. I _think_ git-apply also has
> that information if it bothers to find out.

I am not sure about the latter.  If we declare that we assume the
user never touches the hunk header line (i.e. "@@ ... @@") when
editing the patch text (i.e. " /-/+" lines), the receiving end can
probably guess by counting how many preimage and postimage lines are
described in the hunk and then seeing the change from the numbers on
the hunk header.  

I do not think the "--recount/--overlap" code trusts its input that
much in the current code, and there may be a room to tweak that band
aid with such an additional assumption.

But I dunno.  Once we go that route, and then if we want to assume
less about random things the end-user may do to make it more robust,
"add -p" would need to keep the hunk header before giving the hunk
to the user, and then replace the (possibly modified) hunk header in
the edited patch with the original to tell the receiving end what
the original numbers were to help --recount logic.  At that point, I
do not think it is too big a stretch for the "(e)dit" to actually
correct the hunk headers to not just express the correct offset but
also show the right number of lines, so that it does not have to be
marked with the "DIRTY" bit.  That way, coalesce_overlapping_hunks
logic can finish it off just like other "picked but unedited" hunks.

And when that happens, we may not even need to run "apply" with the
"--recount" and "--allow-overlap" options from this codepath to go
back to the state before 8cbd4310 ("git-add--interactive: replace
hunk recounting with apply --recount", 2008-07-02), where "add -p"
did not fly blind and always knew (or at least "supposed to know")
everything that is happening, which IMO is a preferrable endgame.
If the language spoken between "add -p" and "apply" is "patch", we
should try to stick to that language.  Butchering the listener to
allow this speaker to speak in loose and ambiguous terms, especially
when we can teach the speaker to do so, was a big mistake.