On 28/04/2013 21:38, Junio C Hamano wrote:
@@ -773,6 +861,9 @@ static void limit_to_ancestry(struct
commit_list *bottom, struct commit_list *li
* NEEDSWORK: decide if we want to remove parents that are
* not marked with TMP_MARK from commit->parents for commits
* in the resulting list. We may not want to do that, though.
+ *
+ * Maybe it should be considered if we are TREESAME to such
+ * parents - now possible with stored per-parent flags.
*/
Hmm, that is certainly a thought.
My comment's wrong though. Reconsidering, what I think needs removing
is actually off-ancestry parents that we are !TREESAME to, when we are
TREESAME on the ancestry path.
I thought I read you meant exactly that, i.e. !TREESAME, but now I
re-read what is quoted, you did say "we are TREESAME" ;-). I think
I agree with you that we do not want any side branch that is not on
the ancestry path we are interested in to affect the sameness
assigned to the merge commit.
I did a trial implementation of this in limit_to_ancestry(), and the
result was lovely, but in the end I decided it's not actually the right
place to do it. The logic is more general than that; this isn't just an
ancestry-path issue, and I think "hiding" parents isn't the right way to
go about it anyway.
To slightly generalise your own wording: I think the rule is "we do not
want any side branch that is UNINTERESTING to affect the sameness
assigned to the merge commit". I think that rule applies to all dense,
pruned modes.
Having experimented with some of the annoyingly complex merge paths that
originally prompted this series, it seems this rule makes a huge
difference, and it's useful whether asking "--simplify-merges A..B
<file>" or "--ancestry-path A..B <file>".
At present, either query will show lots of really boring merge commits
of topic branches at the boundary, with 1 INTERESTING parent that
they're TREESAME too, and 1 UNINTERESTING parent that they may or may
not be TREESAME to, depending on how old the base of that topic branch
was. Most such commits are of no relevance to our history whatsoever. In
the case of "--simplify-merges", the fact that they're UNINTERESTING
actually _prevented_ their simplification - if it had been allowed to
follow the UNINTERESTING path back further, it would have reached an
ancestor, and been found redundant. So limiting the rev-list actually
increases the number of merges shown.
We can lose all those boring commits with these two changes:
1) Previously TREESAME was defined as "this commit matches at least 1
parent". My first patch changes it to "this commit matches all parents".
It should be refined further to "this commit matches all INTERESTING
parents, if it has any, else all (UNINTERESTING) parents". (Can we word
that better?) Note that this fancy rule collapses to the same
straightforward TREESAME check as ever for 0- or 1-parent commits.
2) simplify_merges currently will not simplify commits unless they have
exactly 1 parent. That's not what we want. We only need to preserve
commits that don't have exactly 1 INTERESTING parent.
Those 2 rules produce the desirable result: if we have a merge commit
with exactly 1 INTERESTING parent it is TREESAME to, it is always
simplified away - any other UNINTERESTING parents it may have did not
affect our code, so we don't care about whether we were TREESAME to them
or not, and as we don't want to see any of the UNINTERESTING parents
themselves, the merge is not worth showing.
This makes a massive difference on some of my searches, reducing the
total commits shown by a factor of 5 to 10, greatly improving the
signal-to-noise ratio.
I'll put together a trial patch at the end of the next iteration of the
series that implements this logic. I need to think a bit more - I think
"get_commit_action" needs a similar INTERESTING check for merges too, to
get the same sort of effect without relying on simplify_merges. Parent
rewriting shouldn't necessitate keeping all merges - only merges with 2+
INTERESTING parents.
* *
.-A---M---N---O---P
/* / /* /* /*
I B C D E
\ /* / /* /
`-------------'
I've added '*' next to each arc between a commit-pair whose contents
at 'foo' are different to the illustration, following the set-up the
manual describes. E is the same as I for 'foo' and P would resolve
'foo' to be the same as O.
I think that sort of thing could be a useful patch to the docs.
Given this error, and this change, I think this example may want a
slight rethink. Do we want a proper "messing with other paths but
TREESAME merge" example? Say if E's parent was O, P would not be
TREESAME and not included in --full-history.
I am not sure if I follow your last sentence.
Do you mean this topology, where E's sole parent is O, i.e.
E
/ \
N---O---P
/*
D
and E does not change 'foo' from O? Then P is TREESAME to all its
parents and would not have to appear in the full history for the
same reason M does not appear in your earlier IABNDOP output, no?
That's the topology I was thinking of. Yes, P is then "full-TREESAME"
like M, but it's just a more typical example of a real merge and why
TREESAMEness arises than M is. M didn't appear in full-history because
both parents made the same change to foo - indeed both parents were
identical. Whereas P wouldn't appear because E is different, but changed
something other than foo.
Kevin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html