Re: [PATCH v3 7/9] commit: add short-circuit to paint_down_to_common()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/23/2018 5:38 PM, Jakub Narebski wrote:
Derrick Stolee <stolee@xxxxxxxxx> writes:

On 4/18/2018 7:19 PM, Jakub Narebski wrote:
Derrick Stolee <dstolee@xxxxxxxxxxxxx> writes:

[...]
[...], and this saves time during 'git branch --contains' queries
that would otherwise walk "around" the commit we are inspecting.

If I understand the code properly, what happens is that we can now
short-circuit if all commits that are left are lower than the target
commit.

This is because max-order priority queue is used: if the commit with
maximum generation number is below generation number of target commit,
then target commit is not reachable from any commit in the priority
queue (all of which has generation number less or equal than the commit
at head of queue, i.e. all are same level or deeper); compare what I
have written in [1]

[1]: https://public-inbox.org/git/866052dkju.fsf@xxxxxxxxx/

Do I have that right?  If so, it looks all right to me.
Yes, the priority queue needs to compare via generation number first
or there will be errors. This is why we could not use commit time
before.
I was more concerned about getting right the order in the priority queue
(does it return minimal or maximal generation number).

I understand that the cutoff could not be used without generation
numbers because of the possibility of clock skew - using cutoff on dates
could lead to wrong results.

Maximal generation number is important so we do not visit commits multiple times (say, once with PARENT1 set, and a second time when PARENT2 is set). A minimal generation number order would create a DFS order and walk until the cutoff every time.

In cases without clock skew, maximal generation number order will walk the same set of commits as maximal commit time; the order may differ, but only between incomparable commits.

For a copy of the Linux repository, where HEAD is checked out at
v4.13~100, we get the following performance improvement for
'git branch --contains' over the previous commit:

Before: 0.21s
After:  0.13s
Rel %: -38%
[...]
   		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
   		if (flags == (PARENT1 | PARENT2)) {
   			if (!(commit->object.flags & RESULT)) {
@@ -876,7 +886,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
   			return NULL;
   	}
   -	list = paint_down_to_common(one, n, twos);
+	list = paint_down_to_common(one, n, twos, 0);
     	while (list) {
   		struct commit *commit = pop_commit(&list);
@@ -943,7 +953,7 @@ static int remove_redundant(struct commit **array, int cnt)
   			filled_index[filled] = j;
   			work[filled++] = array[j];
   		}
-		common = paint_down_to_common(array[i], filled, work);
+		common = paint_down_to_common(array[i], filled, work, 0);
   		if (array[i]->object.flags & PARENT2)
   			redundant[i] = 1;
   		for (j = 0; j < filled; j++)
@@ -1067,7 +1077,7 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit *
   	if (commit->generation > min_generation)
   		return 0;
   -	bases = paint_down_to_common(commit, nr_reference, reference);
+	bases = paint_down_to_common(commit, nr_reference, reference, commit->generation);
Is it the only case where we would call paint_down_to_common() with
non-zero last parameter?  Would we always use commit->generation where
commit is the first parameter of paint_down_to_common()?

If both are true and will remain true, then in my humble opinion it is
not necessary to change the signature of this function.
We need to change the signature some way, but maybe the way I chose is
not the best.
No, after taking longer I think the new signature is a good choice.

To elaborate: paint_down_to_common() is used for multiple
purposes. The caller here that supplies 'commit->generation' is used
only to compute reachability (by testing if the flag PARENT2 exists on
the commit, then clears all flags). The other callers expect the full
walk down to the common commits, and keeps those PARENT1, PARENT2, and
STALE flags for future use (such as reporting merge bases). Usually
the call to paint_down_to_common() is followed by a revision walk that
only halts when reaching root commits or commits with both PARENT1 and
PARENT2 flags on, so always short-circuiting on generations would
break the functionality; this is confirmed by the
t5318-commit-graph.sh.
Right.

I have realized that just after sending the email.  I'm sorry about this.

An alternative to the signature change is to add a boolean parameter
"use_cutoff" or something, that specifies "don't walk beyond the
commit". This may give a more of a clear description of what it will
do with the generation value, but since we are already performing
generation comparisons before calling paint_down_to_common() I find
this simple enough.
Two things:

1. The signature proposed in the patch is more generic.  The cutoff does
    not need to be equal to the generation number of the commit, though
    currently it always (all of one time the new mechanism is used) is.

    So now I think the new signature of paint_down_to_common() is all
    right as it is proposed here.

2. The way generation numbers are defined (with 0 being a special case,
    and generation numbers starting from 1 for parent-less commits), and
    the way they are compared (using strict comparison, to avoid having
    to special-case _ZERO, _MAX and _INFINITY generation numbers) the
    cutoff of 0 means no cutoff.

    On the other hand cutoff of 0 can be understood as meaning no cutoff
    as a special case.

    It could be made more clear to use (as I proposed elsewhere in this
    thread) symbolic name for this no-cutoff case via preprocessor
    constants or enums, e.g. GENERATION_NO_CUTOFF:

     @@ -876,7 +886,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
       			return NULL;
       	}
       -	list = paint_down_to_common(one, n, twos);
     +	list = paint_down_to_common(one, n, twos, GENERATION_NO_CUTOFF);
         	while (list) {
       		struct commit *commit = pop_commit(&list);
     @@ -943,7 +953,7 @@ static int remove_redundant(struct commit **array, int cnt)
       			filled_index[filled] = j;
       			work[filled++] = array[j];
       		}
     -		common = paint_down_to_common(array[i], filled, work);
     +		common = paint_down_to_common(array[i], filled, work, GENERATION_NO_CUTOFF);
       		if (array[i]->object.flags & PARENT2)
       			redundant[i] = 1;
       		for (j = 0; j < filled; j++)


    But whether it makes code more readable, or less readable, is a
    matter of opinion and taste.


Since paint_down_to_common() is static to this file, I think 0 is cleaner. If the method was external and used by other .c files, then I would use this macro trick to clarify "what does this zero parameter mean?".

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux