Kevin Bracey <kevin@xxxxxxxxx> writes: > On 01/05/2013 00:18, Junio C Hamano wrote: > >>> These rules paying more attention to UNINTERESTING do add a tricky >>> wrinkle to behaviour. Because limited revision lists are conventionally >>> expressed as A..B (ie B ^A), the bottom commit is UNINTERESTING. >> OK. >> >>> Thus >>> its connection to the INTERESTING graph is not privileged over side >>> branches, >> I take that "its connection" refers to the "===" link below, the >> nodes connected with "---" form the "INTERESTING graph", and >> >> ....Z...A===X---o---o---B >> \\ / >> W---Y >> "side branches" refer to the development that built W and Y and >> merged at X. And you are saying that A===X is not "privileged" over >> "Y---X", with some definition of "privileged" I am not sure about. > > Okay, that's a good graph. The basic problem is that all the rules > above try to identify a merge from an irrelevant branch and eliminate > it. But how can we define what a side branch is? > > I think the rules I state are conceptually sound - a side branch is an > extra parent coming in from outside our limited history graph. But the > problem is at the bottom. In the event that someone specifies "A..B" > with the above history, we get the resultant graph "W-Y-X-o-o-B". A is > not on that graph. So with the rules as they stand, "A" being > UNINTERESTING makes it get treated as a side branch of X, which isn't > good. A needs to be INTERESTING for the purpose of side-branch logic. > > So when someone says "A..B" and generates "W-Y-X-o-o-B", we want to > know that X's parent path "(Z)-W-Y-X" is the (possibly irrelevant) > side branch, not "(A)-X". OK, I think I understand it, and we are in agreement. For the purpose of the above paragraph, a side branch is what is not on the "--ancestry-path", so all of the below "examples" are about the behaviour when --ancestry-path is used. > Example undesirable behaviour, with A treated as a side branch: > ... > 1) If only commit A changed our file, and merge X took "old" version Y > for some reason, under these rules "--full-history A..B" would show > nothing. X doesn't consider A because it's UNINTERESTING. If there had > been an intervening (even irrelevant) commit A1 between A1 and X, X > would have been shown. This is exactly why I hinted that ancestry-path limiting may need to come before path limiting decides which parents are interesting and which are irrelevant, and I think it applies the same for the other two examples. I haven't thought through the implications of the definition of "INTERESTING" you propose (without changing the rest of the code to apply ancestry-path first, I assume). While I'd like to see a solution that avoids to actually limit with ancestry-path before limiting with pathspec (because I would imagine that the most naive way to do so is to scan the graph twice, first without pathspec to find and cull side branches from "ancestry-path" point of view and then traverse the resulting graph again with pathspec to further drop branches that did not contribute to the end result), I am fairly sure that the end result we produce, whatever its more efficient implementation would end up to be, _should_ be equilvalent to the "cull with ancestry path first and then ignore noncontributing branches with pathspec" semantics. It is not immediately clear to me that your proposed implementation will give us that semantics. Oh, also, even though I said "I am fairly sure" above, you may already have a counter-example that shows why the semantics I think is right is not useful in some cases. I'll have to think about this a bit more, but it is unlikely I have time to do so for the coming couple of days. Sorry about that. > All 3 cases work if A is treated as "INTERESTING" for side-branch > rules. We shouldn't have needed to put in an extra A1 commit to make > them work. > >>> # D---E-------F >>> # / \ \ >>> +# B---C-G0-G--H---I---J >>> # / \ >>> # A-------K---------------L--M >>> # >>> >>> >>> Conceptually, the "ancestry-path" shouldn't get affected by any >>> pathspec. The range "--ancestry-path G0..M" should be equivalent to >>> the range "G0..M ^F ^E ^K", and with the rule to ignore non-sameness >>> with uninteresting side branches, I would have expected that H and J >>> would be equally irrelevant, because E and F would be outside the >>> graph we would want to look at sameness. > > Those two pathspecs produce the same graph of commits, and yes, > they've always produced the same thing up until now. But we're trying > to do something new(ish) here. We're trying to define "side branch", > to allow us to make more useful pruning comparisons. And we can't > reliably define "side branch" unless we can reliably define where > we're coming from. > > Looking at this case, given that graph of commits G-H-I-J-L-M > (produced by any pathspec/flags), that is "easy" because it's > linear. The bottom of the INTERESTING graph has a single parent, and > we can follow it straight from bottom to top. We can deduce that > non-merge "G" is bottom, and thus call the connections to E and F > "sides". (But that could have been wrong if the graph had been made > some other way, and the user wasn't asking for history since G0). > > But if presented with H-I-J-L-M we get stuck. The lowest commit in our > graph has 2 parents. How do we distinguish between E and G? Which is > the side, and which is the bottom? We can only define "side branch" > here if we know where our bottom is. The version of this patch as > posted can't distinguish whether E or G is more important, so merge H > always gets shown. And that makes me unhappy. And note that normal > merge-simplification will often prune away boring non-merge commits, > leaving the user-specified UNINTERESTING bottom attached to an > INTERESTING graph by a merge commit like this. So it would be very > common to stumble over this first connection onto the INTERESTING > graph with --simplify-merges. > > So my proposal here is that for pruning purposes, we need to mark the > bottom(s), so we can reliably identify the sides.A side branch is > spotted by being UNINTERESTING && !BOTTOM. > > Doing this means that "--ancestry-path E..M", "--ancestry-path G..M" > and "G..M ^F ^E ^K" would all produce a walk starting at H, but for > pruning purposes they will prioritise differences against specified > bottom commits, and discount them against non-specified boundary > commits. > > So "E..M" will treat "G" as a side branch, and "G..M" will treat "D-E" > as a side branch. "--ancestry-path E..M" will show H iff it differs > from E, and "--ancestry-path G..M" will show H iff it differs from G. > > And as a result, manually specifying with "G..M ^F ^E ^K" will be > paying extra attention to merges H, J and L, caring if they differ > from listed commits E F and K, and not treating them as ignorable side > branches. H will be shown if it differs from either G or E. > > All of which feels right and good to me, but it does mean the neat > mathematical purity of the commit set model is somewhat disrupted - > different pathspecs specifying the same set of commits will prune > differently. But I think the behaviour fits intuitive expectation > well. When a user specifies "A..B" or "B ^A", they almost always mean > that they want the change since A, and they’re not thinking of A as a > “commit set subtraction”. The specific edge to A is important to > them. So let's be helpful and meet normal expectation, and treat the > edge between the INTERESTING graph and the specified bottom commit as > important. > > Revised patch 8/8 doing this follows. The fact it adds a BOTTOM flag > potentially has an impact on other areas, as noted in the comments. If > we went down this route, the BOTTOM flag addition would obviously want > to be a separate preceding patch, covering impact on > collect_bottom_commits etc. > > And the thing about "only need merges with 2+ INTERESTING parents for > topology" should really be separated out too - that's actually quite > distinct from the TREESAME side branch stuff, and rather more > straightforward. > > Kevin -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html