Re: [PATCH] patch-ids: handle duplicate hashmap entries

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 12 Jan 2021 11:13:06 -0800

Jeff King <peff@xxxxxxxx> writes:

> At the end, we'll have eliminated commits from both sides that have a
> matching patch-id on the other side. But there's a subtle assumption
> here: for any given patch-id, we must have exactly one struct
> representing it. If two commits from A both have the same patch-id and
> we allow duplicates in the hashmap, then we run into a problem:

In practical terms, for one side of the history to have two
identical patches, the most likely scenerio for that to happen is to
have a patch, its reversion, and its reapplication, intermixed in
its history with other commits, e.g.

    ---o---o---M---o---o---W---o---o---M---o--- ...
        \
         o---o---o---M---o--- ...

where "M" are commits that does an identical change, and "W"
(upside-down "M") is its reversion.  On the top history, "M" was
introduced, the bottom history cherry-picked, the top history found
problems in it and reverted with "W", and later (perhaps with the
help of preparatory patches on top of "W"), the top history now
considers that "M" is safe, so reapplies it.

And a "--cherry-pick" output that excludes "identical patches" that
appear on both sides on such a history would hide all "M"'s, leaving
a history like

    ---o---o-------o---o---W---o---o-------o--- ...
        \
         o---o---o-------o--- ...

But is this result what the user really wanted to see, I have to
wonder.

I do not see any problem in the patch itself.  We used to hide only
one "M" from the history on the top half in the picture, leaving one
"M" and "W", while hiding the sole "M" from the bottom half.  Now if
we want to no longer show any "M", the updated code would correctly
hide all of them.

It just feels to me that the resulting view of the history look
weird, leaving only the reversion of a patch that has never been
applied in the first place.