Re: [RFC/PATCH 1/3] revision.c: tighten up TREESAME handling of merges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/04/2013 21:38, Junio C Hamano wrote:

   @@ -773,6 +861,9 @@ static void limit_to_ancestry(struct
commit_list *bottom, struct commit_list *li
   	 * NEEDSWORK: decide if we want to remove parents that are
   	 * not marked with TMP_MARK from commit->parents for commits
   	 * in the resulting list.  We may not want to do that, though.
+	 *
+	 * Maybe it should be considered if we are TREESAME to such
+	 * parents - now possible with stored per-parent flags.
   	 */
Hmm, that is certainly a thought.
My comment's wrong though. Reconsidering, what I think needs removing
is actually off-ancestry parents that we are !TREESAME to, when we are
TREESAME on the ancestry path.
I thought I read you meant exactly that, i.e. !TREESAME, but now I
re-read what is quoted, you did say "we are TREESAME" ;-).  I think
I agree with you that we do not want any side branch that is not on
the ancestry path we are interested in to affect the sameness
assigned to the merge commit.

I did a trial implementation of this in limit_to_ancestry(), and the result was lovely, but in the end I decided it's not actually the right place to do it. The logic is more general than that; this isn't just an ancestry-path issue, and I think "hiding" parents isn't the right way to go about it anyway.

To slightly generalise your own wording: I think the rule is "we do not want any side branch that is UNINTERESTING to affect the sameness assigned to the merge commit". I think that rule applies to all dense, pruned modes.

Having experimented with some of the annoyingly complex merge paths that originally prompted this series, it seems this rule makes a huge difference, and it's useful whether asking "--simplify-merges A..B <file>" or "--ancestry-path A..B <file>".

At present, either query will show lots of really boring merge commits of topic branches at the boundary, with 1 INTERESTING parent that they're TREESAME too, and 1 UNINTERESTING parent that they may or may not be TREESAME to, depending on how old the base of that topic branch was. Most such commits are of no relevance to our history whatsoever. In the case of "--simplify-merges", the fact that they're UNINTERESTING actually _prevented_ their simplification - if it had been allowed to follow the UNINTERESTING path back further, it would have reached an ancestor, and been found redundant. So limiting the rev-list actually increases the number of merges shown.

We can lose all those boring commits with these two changes:

1) Previously TREESAME was defined as "this commit matches at least 1 parent". My first patch changes it to "this commit matches all parents". It should be refined further to "this commit matches all INTERESTING parents, if it has any, else all (UNINTERESTING) parents". (Can we word that better?) Note that this fancy rule collapses to the same straightforward TREESAME check as ever for 0- or 1-parent commits.

2) simplify_merges currently will not simplify commits unless they have exactly 1 parent. That's not what we want. We only need to preserve commits that don't have exactly 1 INTERESTING parent.

Those 2 rules produce the desirable result: if we have a merge commit with exactly 1 INTERESTING parent it is TREESAME to, it is always simplified away - any other UNINTERESTING parents it may have did not affect our code, so we don't care about whether we were TREESAME to them or not, and as we don't want to see any of the UNINTERESTING parents themselves, the merge is not worth showing.

This makes a massive difference on some of my searches, reducing the total commits shown by a factor of 5 to 10, greatly improving the signal-to-noise ratio.

I'll put together a trial patch at the end of the next iteration of the series that implements this logic. I need to think a bit more - I think "get_commit_action" needs a similar INTERESTING check for merges too, to get the same sort of effect without relying on simplify_merges. Parent rewriting shouldn't necessitate keeping all merges - only merges with 2+ INTERESTING parents.


                   *   *
           .-A---M---N---O---P
          /*    /   /*  /*  /*
         I     B   C   D   E
          \   /*  /   /*  /
           `-------------'
I've added '*' next to each arc between a commit-pair whose contents
at 'foo' are different to the illustration, following the set-up the
manual describes.  E is the same as I for 'foo' and P would resolve
'foo' to be the same as O.

I think that sort of thing could be a useful patch to the docs.

Given this error, and this change, I think this example may want a
slight rethink. Do we want a proper "messing with other paths but
TREESAME merge" example? Say if E's parent was O, P would not be
TREESAME and not included in --full-history.
I am not sure if I follow your last sentence.

Do you mean this topology, where E's sole parent is O, i.e.

               E
              / \
	N---O---P
            /*
           D

and E does not change 'foo' from O?  Then P is TREESAME to all its
parents and would not have to appear in the full history for the
same reason M does not appear in your earlier IABNDOP output, no?

That's the topology I was thinking of. Yes, P is then "full-TREESAME" like M, but it's just a more typical example of a real merge and why TREESAMEness arises than M is. M didn't appear in full-history because both parents made the same change to foo - indeed both parents were identical. Whereas P wouldn't appear because E is different, but changed something other than foo.

Kevin



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]