On 01/05/2013 00:18, Junio C Hamano wrote:
These rules paying more attention to UNINTERESTING do add a tricky
wrinkle to behaviour. Because limited revision lists are conventionally
expressed as A..B (ie B ^A), the bottom commit is UNINTERESTING.
OK.
Thus
its connection to the INTERESTING graph is not privileged over side
branches,
I take that "its connection" refers to the "===" link below, the
nodes connected with "---" form the "INTERESTING graph", and
....Z...A===X---o---o---B
\\ /
W---Y
"side branches" refer to the development that built W and Y and
merged at X. And you are saying that A===X is not "privileged" over
"Y---X", with some definition of "privileged" I am not sure about.
Okay, that's a good graph. The basic problem is that all the rules above
try to identify a merge from an irrelevant branch and eliminate it. But
how can we define what a side branch is?
I think the rules I state are conceptually sound - a side branch is an
extra parent coming in from outside our limited history graph. But the
problem is at the bottom. In the event that someone specifies "A..B"
with the above history, we get the resultant graph "W-Y-X-o-o-B". A is
not on that graph. So with the rules as they stand, "A" being
UNINTERESTING makes it get treated as a side branch of X, which isn't
good. A needs to be INTERESTING for the purpose of side-branch logic.
So when someone says "A..B" and generates "W-Y-X-o-o-B", we want to know
that X's parent path "(Z)-W-Y-X" is the (possibly irrelevant) side
branch, not "(A)-X".
Example undesirable behaviour, with A treated as a side branch:
1) If only commit A changed our file, and merge X took "old" version Y
for some reason, under these rules "--full-history A..B" would show
nothing. X doesn't consider A because it's UNINTERESTING. If there had
been an intervening (even irrelevant) commit A1 between A1 and X, X
would have been shown.
2) If only commit A changed our file, and merge X took A, with this
rules"--simplify-merges A..B" would show X, with two rewritten
UNINTERESTING parents A and Z. That's not what we want - Z is an
irrelevant side-branch in this case. If there had been an intervening
(unsimplifiable) commit A1 between A and X, X would have had INTERESTING
parent A1, and WY would have been successfully discarded.
3) Even before my new rules, there is one existing place trying to spot
side branches, and it can fail here for the same reasons. See
simplify_history's "even if a merge with an uninteresting side branch
brought the entire change we are interested in" test. It would do the
wrong thing at X, if W had made a change and Y reverted it. "git log
A..B" would show W and Y, which is not what we want. As the scan hits X,
it follows parent Y rather than just go for first parent A, because it
thinks A is "an uninteresting side branch". Again, if there had been an
intervening commit A1 before X, X would have followed A1 and W and Y
would not have been shown.
All 3 cases work if A is treated as "INTERESTING" for side-branch rules.
We shouldn't have needed to put in an extra A1 commit to make them work.
# D---E-------F
# / \ \
+# B---C-G0-G--H---I---J
# / \
# A-------K---------------L--M
#
Conceptually, the "ancestry-path" shouldn't get affected by any
pathspec. The range "--ancestry-path G0..M" should be equivalent to
the range "G0..M ^F ^E ^K", and with the rule to ignore non-sameness
with uninteresting side branches, I would have expected that H and J
would be equally irrelevant, because E and F would be outside the
graph we would want to look at sameness.
Those two pathspecs produce the same graph of commits, and yes, they've
always produced the same thing up until now. But we're trying to do
something new(ish) here. We're trying to define "side branch", to allow
us to make more useful pruning comparisons. And we can't reliably define
"side branch" unless we can reliably define where we're coming from.
Looking at this case, given that graph of commits G-H-I-J-L-M (produced
by any pathspec/flags), that is "easy" because it's linear. The bottom
of the INTERESTING graph has a single parent, and we can follow it
straight from bottom to top. We can deduce that non-merge "G" is bottom,
and thus call the connections to E and F "sides". (But that could have
been wrong if the graph had been made some other way, and the user
wasn't asking for history since G0).
But if presented with H-I-J-L-M we get stuck. The lowest commit in our
graph has 2 parents. How do we distinguish between E and G? Which is the
side, and which is the bottom? We can only define "side branch" here if
we know where our bottom is. The version of this patch as posted can't
distinguish whether E or G is more important, so merge H always gets
shown. And that makes me unhappy. And note that normal
merge-simplification will often prune away boring non-merge commits,
leaving the user-specified UNINTERESTING bottom attached to an
INTERESTING graph by a merge commit like this. So it would be very
common to stumble over this first connection onto the INTERESTING graph
with --simplify-merges.
So my proposal here is that for pruning purposes, we need to mark the
bottom(s), so we can reliably identify the sides.A side branch is
spotted by being UNINTERESTING && !BOTTOM.
Doing this means that "--ancestry-path E..M", "--ancestry-path G..M" and
"G..M ^F ^E ^K" would all produce a walk starting at H, but for pruning
purposes they will prioritise differences against specified bottom
commits, and discount them against non-specified boundary commits.
So "E..M" will treat "G" as a side branch, and "G..M" will treat "D-E"
as a side branch. "--ancestry-path E..M" will show H iff it differs from
E, and "--ancestry-path G..M" will show H iff it differs from G.
And as a result, manually specifying with "G..M ^F ^E ^K" will be paying
extra attention to merges H, J and L, caring if they differ from listed
commits E F and K, and not treating them as ignorable side branches. H
will be shown if it differs from either G or E.
All of which feels right and good to me, but it does mean the neat
mathematical purity of the commit set model is somewhat disrupted -
different pathspecs specifying the same set of commits will prune
differently. But I think the behaviour fits intuitive expectation well.
When a user specifies "A..B" or "B ^A", they almost always mean that
they want the change since A, and they’re not thinking of A as a “commit
set subtraction”. The specific edge to A is important to them. So let's
be helpful and meet normal expectation, and treat the edge between the
INTERESTING graph and the specified bottom commit as important.
Revised patch 8/8 doing this follows. The fact it adds a BOTTOM flag
potentially has an impact on other areas, as noted in the comments. If
we went down this route, the BOTTOM flag addition would obviously want
to be a separate preceding patch, covering impact on
collect_bottom_commits etc.
And the thing about "only need merges with 2+ INTERESTING parents for
topology" should really be separated out too - that's actually quite
distinct from the TREESAME side branch stuff, and rather more
straightforward.
Kevin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html