Re: [PATCH v2 8/8] revision.c: discount UNINTERESTING parents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/05/2013 00:18, Junio C Hamano wrote:

These rules paying more attention to UNINTERESTING do add a tricky
wrinkle to behaviour. Because limited revision lists are conventionally
expressed as A..B (ie B ^A), the bottom commit is UNINTERESTING.
OK.

Thus
its connection to the INTERESTING graph is not privileged over side
branches,
I take that "its connection" refers to the "===" link below, the
nodes connected with "---" form the "INTERESTING graph", and

      ....Z...A===X---o---o---B
           \\    /
            W---Y
"side branches" refer to the development that built W and Y and
merged at X.  And you are saying that A===X is not "privileged" over
"Y---X", with some definition of "privileged" I am not sure about.

Okay, that's a good graph. The basic problem is that all the rules above try to identify a merge from an irrelevant branch and eliminate it. But how can we define what a side branch is?

I think the rules I state are conceptually sound - a side branch is an extra parent coming in from outside our limited history graph. But the problem is at the bottom. In the event that someone specifies "A..B" with the above history, we get the resultant graph "W-Y-X-o-o-B". A is not on that graph. So with the rules as they stand, "A" being UNINTERESTING makes it get treated as a side branch of X, which isn't good. A needs to be INTERESTING for the purpose of side-branch logic.

So when someone says "A..B" and generates "W-Y-X-o-o-B", we want to know that X's parent path "(Z)-W-Y-X" is the (possibly irrelevant) side branch, not "(A)-X".

Example undesirable behaviour, with A treated as a side branch:

1) If only commit A changed our file, and merge X took "old" version Y for some reason, under these rules "--full-history A..B" would show nothing. X doesn't consider A because it's UNINTERESTING. If there had been an intervening (even irrelevant) commit A1 between A1 and X, X would have been shown.

2) If only commit A changed our file, and merge X took A, with this rules"--simplify-merges A..B" would show X, with two rewritten UNINTERESTING parents A and Z. That's not what we want - Z is an irrelevant side-branch in this case. If there had been an intervening (unsimplifiable) commit A1 between A and X, X would have had INTERESTING parent A1, and WY would have been successfully discarded.

3) Even before my new rules, there is one existing place trying to spot side branches, and it can fail here for the same reasons. See simplify_history's "even if a merge with an uninteresting side branch brought the entire change we are interested in" test. It would do the wrong thing at X, if W had made a change and Y reverted it. "git log A..B" would show W and Y, which is not what we want. As the scan hits X, it follows parent Y rather than just go for first parent A, because it thinks A is "an uninteresting side branch". Again, if there had been an intervening commit A1 before X, X would have followed A1 and W and Y would not have been shown.

All 3 cases work if A is treated as "INTERESTING" for side-branch rules. We shouldn't have needed to put in an extra A1 commit to make them work.

  #          D---E-------F
  #         /     \       \
+#    B---C-G0-G--H---I---J
  #   /                     \
  #  A-------K---------------L--M
  #


Conceptually, the "ancestry-path" shouldn't get affected by any
pathspec. The range "--ancestry-path G0..M" should be equivalent to
the range "G0..M ^F ^E ^K", and with the rule to ignore non-sameness
with uninteresting side branches, I would have expected that H and J
would be equally irrelevant, because E and F would be outside the
graph we would want to look at sameness.

Those two pathspecs produce the same graph of commits, and yes, they've always produced the same thing up until now. But we're trying to do something new(ish) here. We're trying to define "side branch", to allow us to make more useful pruning comparisons. And we can't reliably define "side branch" unless we can reliably define where we're coming from.

Looking at this case, given that graph of commits G-H-I-J-L-M (produced by any pathspec/flags), that is "easy" because it's linear. The bottom of the INTERESTING graph has a single parent, and we can follow it straight from bottom to top. We can deduce that non-merge "G" is bottom, and thus call the connections to E and F "sides". (But that could have been wrong if the graph had been made some other way, and the user wasn't asking for history since G0).

But if presented with H-I-J-L-M we get stuck. The lowest commit in our graph has 2 parents. How do we distinguish between E and G? Which is the side, and which is the bottom? We can only define "side branch" here if we know where our bottom is. The version of this patch as posted can't distinguish whether E or G is more important, so merge H always gets shown. And that makes me unhappy. And note that normal merge-simplification will often prune away boring non-merge commits, leaving the user-specified UNINTERESTING bottom attached to an INTERESTING graph by a merge commit like this. So it would be very common to stumble over this first connection onto the INTERESTING graph with --simplify-merges.

So my proposal here is that for pruning purposes, we need to mark the bottom(s), so we can reliably identify the sides.A side branch is spotted by being UNINTERESTING && !BOTTOM.

Doing this means that "--ancestry-path E..M", "--ancestry-path G..M" and "G..M ^F ^E ^K" would all produce a walk starting at H, but for pruning purposes they will prioritise differences against specified bottom commits, and discount them against non-specified boundary commits.

So "E..M" will treat "G" as a side branch, and "G..M" will treat "D-E" as a side branch. "--ancestry-path E..M" will show H iff it differs from E, and "--ancestry-path G..M" will show H iff it differs from G.

And as a result, manually specifying with "G..M ^F ^E ^K" will be paying extra attention to merges H, J and L, caring if they differ from listed commits E F and K, and not treating them as ignorable side branches. H will be shown if it differs from either G or E.

All of which feels right and good to me, but it does mean the neat mathematical purity of the commit set model is somewhat disrupted - different pathspecs specifying the same set of commits will prune differently. But I think the behaviour fits intuitive expectation well. When a user specifies "A..B" or "B ^A", they almost always mean that they want the change since A, and they’re not thinking of A as a “commit set subtraction”. The specific edge to A is important to them. So let's be helpful and meet normal expectation, and treat the edge between the INTERESTING graph and the specified bottom commit as important.

Revised patch 8/8 doing this follows. The fact it adds a BOTTOM flag potentially has an impact on other areas, as noted in the comments. If we went down this route, the BOTTOM flag addition would obviously want to be a separate preceding patch, covering impact on collect_bottom_commits etc.

And the thing about "only need merges with 2+ INTERESTING parents for topology" should really be separated out too - that's actually quite distinct from the TREESAME side branch stuff, and rather more straightforward.

Kevin

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]