Re: [PATCH v3 5/9] ref-filter: use generation number for --contains

Jakub Narebski <jnareb@xxxxxxxxx> · Tue, 24 Apr 2018 20:56:40 +0200

Derrick Stolee <stolee@xxxxxxxxx> writes:
> On 4/18/2018 5:02 PM, Jakub Narebski wrote:
>> Derrick Stolee <dstolee@xxxxxxxxxxxxx> writes:
>>
>>> A commit A can reach a commit B only if the generation number of A
>>> is larger than the generation number of B. This condition allows
>>> significantly short-circuiting commit-graph walks.
>>>
>>> Use generation number for 'git tag --contains' queries.
>>>
>>> On a copy of the Linux repository where HEAD is containd in v4.13
>>> but no earlier tag, the command 'git tag --contains HEAD' had the
>>> following peformance improvement:
>>>
>>> Before: 0.81s
>>> After:  0.04s
>>> Rel %:  -95%
>>
>> A question: what is the performance after if the "commit-graph" feature
>> is disabled, or there is no commit-graph file?  Is there performance
>> regression in this case, or is the difference negligible?
>
> Negligible, since we are adding a small number of integer comparisons
> and the main cost is in commit parsing. More on commit parsing in
> response to your comments below.

If it is proven to be always negligible, then its all right.  If it is
unlikely to be non-negligible, well, still O.K.  But I wonder if maybe
there is some situation where the cost of extra parsing is non-negligble.

[...]
>>>   @@ -1618,8 +1623,18 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
>>>   					      struct contains_cache *cache)
>>>   {
>>>   	struct contains_stack contains_stack = { 0, 0, NULL };
>>> -	enum contains_result result = contains_test(candidate, want, cache);
>>> +	enum contains_result result;
>>> +	uint32_t cutoff = GENERATION_NUMBER_INFINITY;
>>> +	const struct commit_list *p;
>>> +
>>> +	for (p = want; p; p = p->next) {
>>> +		struct commit *c = p->item;
>>> +		parse_commit_or_die(c);
>>> +		if (c->generation < cutoff)
>>> +			cutoff = c->generation;
>>> +	}
>> Sholdn't the above be made conditional on the ability to get generation
>> numbers from the commit-graph file (feature is turned on and file
>> exists)?  Otherwise here after the change contains_tag_algo() now parses
>> each commit in 'want', which I think was not done previously.
>>
>> With commit-graph file parsing is [probably] cheap.  Without it, not
>> necessary.
>>
>> But I might be worrying about nothing.
>
> Not nothing. This parses the "wants" when we previously did not parse
> the wants. Further: this parsing happens before we do the simple check
> of comparing the OID of the candidate against the wants.
>
> The question is: are these parsed commits significant compared to the
> walk that will parse many more commits? It is certainly possible.
>
> One way to fix this is to call 'prepare_commit_graph()' directly and
> then test that 'commit_graph' is non-null before performing any
> parses. I'm not thrilled with how that couples the commit-graph
> implementation to this feature, but that may be necessary to avoid
> regressions in the non-commit-graph case.

Another possible solution (not sure if better or worse) would be to
change the signature of contains_tag_algo() function to take parameter
or flag that would decide whether to parse "wants".

Best,
-- 
Jakub Narębski