Re: [PATCH v2 06/10] commit.c: use generation to halt paint walk

Derrick Stolee <stolee@xxxxxxxxx> · Wed, 11 Apr 2018 09:24:01 -0400




On 4/10/2018 11:02 PM, Junio C Hamano wrote:
Derrick Stolee <dstolee@xxxxxxxxxxxxx> writes:

@@ -800,17 +810,26 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, struc
  		return result;
  	}
  	prio_queue_put(&queue, one);
+	if (one->generation < min_nonstale_gen)
+		min_nonstale_gen = one->generation;
  
  	for (i = 0; i < n; i++) {
  		twos[i]->object.flags |= PARENT2;
  		prio_queue_put(&queue, twos[i]);
+		if (twos[i]->generation < min_nonstale_gen)
+			min_nonstale_gen = twos[i]->generation;
  	}
  
-	while (queue_has_nonstale(&queue)) {
+	while (queue_has_nonstale(&queue, min_nonstale_gen)) {
  		struct commit *commit = prio_queue_get(&queue);
  		struct commit_list *parents;
  		int flags;
  
+		if (commit->generation > last_gen)
+			BUG("bad generation skip");
+
+		last_gen = commit->generation;
+
  		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
  		if (flags == (PARENT1 | PARENT2)) {
  			if (!(commit->object.flags & RESULT)) {
@@ -830,6 +849,10 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, struc
  				return NULL;
  			p->object.flags |= flags;
Hmph.  Can a commit that used to be not stale (and contributed to
the current value of min_nonstale_gen) become stale here by getting
visited twice, invalidating the value in min_nonstale_gen?

min_nonstale_gen can be "wrong" in the way you say, but fits the 
definition from the commit message:

"To properly take advantage of this condition, track the minimum 
generation number of a commit that **enters the queue** with nonstale 
status." (Emphasis added)

You make an excellent point about how this can be problematic. I was 
confused by the lack of clear performance benefits here, but I think 
that whatever benefits making queue_has_nonstale() be O(1) were removed 
by walking more commits than necessary.

Consider the following commit graph, where M is a parent of both A and 
B, S is a parent of M and B, and there is a large set of commits 
reachable from M with generation number larger than gen(S).

A    B
| __/|
|/   |
M    |
|\   |
. |  |
. |  |
. |_/
|/
S

Between A and B, the true merge base is M. Anything reachable from M is 
marked as stale. When S is added to the queue, it is only reachable from 
B, so it is non-stale. However, it is marked stale after M is walked. 
The old code would detect this as a termination condition, but the new 
code would not.

I think this data shape is actually common (not exactly, as it may be 
that some ancestor of M provides a second path to S) especially in the 
world of pull requests and users merging master into their topic branches.

I'll remove this commit in the next version, but use the new prototype 
for queue_has_nonstale() in "commit: add short-circuit to 
paint_down_to_common()" using the given 'min_generation' instead of 
'min_nonstale_gen'.

Thanks,
-Stolee