Re: [RFC v2] blame: new option --prefer-first to better handle merged cherry-picks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> The "pick the one that exactly matches if exists" can be thought of
> an easy hack to hide the problems that come from this arbitrary
> choice.  ...
> Instead, "pass the whole blame to the one that exactly matches" hack
> keeps larger blocks of text unsplit, clumping related contents
> together as long as possible while we traverse the history.
>
> It is an "easy hack", because we only need to compare the object
> name, but a logical extension to it would have been to compute the
> similarity scores between the result and each of the parents, sort
> the parents by that similarity score order, and give more similar
> ones a chance to claim responsibility before less similar ones.
> We could call it "favouring similar ones", i.e. "--prefer-similar"
> or something.

Extending along the tangent further.

Another thing that I found the argument in the proposed log message
of the patch weak was that the claim that changed code will assign
the blame to the "same" commit for both path b and c.  There are two
reasons why.  One is that we do not look at b while chasing the
ancestry of c, so if a different traverse order assigns the blame to
the same commit for them, it is a mere happenstance.  But a more
important reason is that the changed code will still assign the
blame for "different" commits if the final merge were made in the
opposite direction.  In your original topology, we skip over the
first parent and give the whole blame to the second parent without
the change, and with the change, we stop doing so and instead give
some blame to the first parent and then allow the second parent a
chance to claim the blame for the remainder.  But in a history where
the final merge went in the opposite direction, even with the
change, we compare with the "first" parent (which was the "second"
one in your original topology) with the result, find out that the
contents exactly match, and that parent grabs the whole blame.  So
in that sense, the updated code that "consistently" gives earlier
parents chance to claim the blame before later ones does not behave
consitently on the same history with different merge parent order.

That makes me think that the reason why the result you got with the
change is better (assuming it is better) is _not_ because the
updated code lets earlier parents give chance to claim the blame; it
could be an indication that the "keep larger blocks of text unsplit,
clumping related contents together as long as possible" heuristics
is what prevents us from having a better result.

If that is really the case, that would mean that letting the blame
split early would give us a better result.  I alluded to "give more
similar parents first chance to claim responsibility before less
similar ones" in the previous message, but perhaps this is
indicating that we might get a better result if we did the
opposite---instead of assigning blames to earlier parents and then
to later ones, compare the result with each parent, order the
parents by how few lines of blame they could claim if each of them
were allowed to go first, and then actually compute and assign the
blame in that order, "favouring dissimilar ones".  That may produce
the result you are after in a more consistent way, regardless of the
merge order.

I think I've done thinking about this issue, at least for now.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]