Re: [PATCH] name-rev: use generation numbers if available

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/28/2022 11:50 AM, Derrick Stolee wrote:
> On 2/28/2022 2:07 PM, Jacob Keller wrote:
>> From: Jacob Keller <jacob.keller@xxxxxxxxx>
>>
>> If a commit in a sequence of linear history has a non-monotonically
>> increasing commit timestamp, git name-rev might not properly name the
>> commit.
>>
>> This occurs because name-rev uses a heuristic of the commit date to
>> avoid searching down tags which lead to commits that are older than the
>> named commit. This is intended to avoid work on larger repositories.
>>
>> This heuristic impacts git name-rev, and by extension git describe
>> --contains which is built on top of name-rev.
>>
>> Further more, if --annotate-stdin is used, the heuristic is not enabled
>> because the full history has to be analyzed anyways. This results in
>> some confusion if a user sees that --annotate-stdin works but a normal
>> name-rev does not.
>>
>> If the repository has a commit graph, we can use the generation numbers
>> instead of using the commit dates. This is essentially the same check
>> except that generation numbers make it exact, where the commit date
>> heuristic could be incorrect due to clock errors.
>>
>> Add a test case which covers this behavior and shows how the commit
>> graph makes the name-rev process work.
>>
>> Signed-off-by: Jacob Keller <jacob.keller@xxxxxxxxx>
>> ---
>> The initial implementation of this came from [1]. Should this have Stolee's
>> sign-off?
>>
>> [1]: https://lore.kernel.org/git/42d2a9fe-a3f2-f841-dcd1-27a0440521b6@xxxxxxxxxx/
> 
> I think your implementation is sufficiently different (and better)
> that you don't need my co-authorship _or_ sign-off.
> 

Alright.

>> +static void set_commit_cutoff(struct commit *commit)
>> +{
>> +	timestamp_t generation;
>> +
>> +	if (cutoff > commit->date)
>> +		cutoff = commit->date;
>> +
>> +	generation = commit_graph_generation(commit);
>> +	if (generation_cutoff > generation)
>> +		generation_cutoff = generation;
>> +}
> 
> I appreciate that you split this out into its own method to isolate
> the logic.

Yea I noticed the cuttoff check in a few places and it was getting
confusing to see the multiple line check. It also made it easier to
format this whole function nicely. Once I did that, I thought it made
more sense to split as much of the accesses to the globals into separate
functions as possible.

> 
>> +/* Check if a commit is before the cutoff. Prioritize generation numbers
>> + * first, but use the commit timestamp if we lack generation data.
>> + */
>> +static int commit_is_before_cutoff(struct commit *commit)
>> +{
>> +	if (generation_cutoff < GENERATION_NUMBER_INFINITY)
>> +		return commit_graph_generation(commit) < generation_cutoff;
>> +
>> +	return commit->date < cutoff;
>> +}
> 
> There are two subtle things going on here when generation_cutoff is
> zero:
> 
> 1. In a commit-graph with topological levels _or_ generation numbers v2,
>    commit_graph_generation(commit) will always be positive, so we don't
>    need to do the lookup.

I.e. once we have a generation_cutoff of 0 we can just completely bypass
the lookup, saving some time.

I think we can do "return generation_cutoff &&
commit_graph_generation(commit) < generation_cutoff"

> 
> 2. If the commit-graph was written by an older Git version before
>    topological levels were implemented, then the generation of commits
>    in the commit-graph are all zero(!). This means that the logic here
>    would be incorrect for the "all" case.
> 
> The fix for both cases is to return 1 if generation_cutoff is zero:
> 

I think you mean return 0? Because this returns true if the commit is
before the cutoff, but false if its not. (i.e. if its true, we should
stop searching this commit, but if its false we should continue searching?



> 	if (!generaton_cutoff)
> 		return 1;
>> -	if (start_commit->date < cutoff)
>> +	if (commit_is_before_cutoff(start_commit))
> 
>> -			if (parent->date < cutoff)
>> +			if (commit_is_before_cutoff(parent))
> 
> Nice replacements.
> 
>> -	if (all || annotate_stdin)
>> +	if (all || annotate_stdin) {
>> +		generation_cutoff = 0;
>>  		cutoff = 0;
>> +	}
> 
> Good.
> 
>> -		if (commit) {
>> -			if (cutoff > commit->date)
>> -				cutoff = commit->date;
>> -		}
>> +		if (commit)
>> +			set_commit_cutoff(commit);
> 
> Another nice replacement.
> 
>> +# A-B-C-D-E-main
>> +#
>> +# Where C has a non-monotonically increasing commit timestamp w.r.t. other
>> +# commits
>> +test_expect_success 'non-monotonic commit dates setup' '
>> +	UNIX_EPOCH_ZERO="@0 +0000" &&
>> +	git init non-monotonic &&
>> +	test_commit -C non-monotonic A &&
>> +	test_commit -C non-monotonic --no-tag B &&
>> +	test_commit -C non-monotonic --no-tag --date "$UNIX_EPOCH_ZERO" C &&
>> +	test_commit -C non-monotonic D &&
>> +	test_commit -C non-monotonic E
>> +'
>> +
>> +test_expect_success 'name-rev with commitGraph handles non-monotonic timestamps' '
>> +	test_config -C non-monotonic core.commitGraph true &&
>> +	(
>> +		cd non-monotonic &&
>> +
>> +		# Ensure commit graph is up to date
>> +		git -c gc.writeCommitGraph=true gc &&
> 
> "git commit-graph write --reachable" would suffice here.
> 
> 

Ah. I was looking for something like that but couldn't find it. I saw
this in another test which is what i settled on. Will fix.

>> +
>> +		echo "main~3 tags/D~2" >expect &&
>> +		git name-rev --tags main~3 >actual &&
>> +
>> +		test_cmp expect actual
>> +	)
>> +'
>> +
>> +test_expect_success 'name-rev --all works with non-monotonic' '
> 
> This is working because of the commit-graph, right? We still have
> it from the previous test, so we aren't testing that this works
> when we only have the commit date as a cutoff.
> 

I can either extend this test or add a separate test which covers this.
The test failed before I added the commit graph line.

>> +	(
>> +		cd non-monotonic &&
>> +
>> +		cat >expect <<-\EOF &&
>> +		E
>> +		D
>> +		D~1
>> +		D~2
>> +		A
>> +		EOF
>> +
>> +		git log --pretty=%H >revs &&
>> +		git name-rev --tags --annotate-stdin --name-only <revs >actual &&
>> +
>> +		test_cmp expect actual
>> +	)
> 
> Do you want to include a test showing the "expected" behavior
> when there isn't a commit-graph file? You might need to delete
> an existing commit-graph (it will exist in the case of
> GIT_TEST_COMMIT_GRAPH=1).
> 

This test actually is intended to show that it works regardless of
whether we have a commit graph. (Because in --annotate-stdin mode we
disable the heuristic since we don't know what commits we'll see in advance)

Is there a good way to delete the graph file?

> I also see that you intended to test the "--all" option, which
> is not included in your test. That's probably the real key to
> getting this test to work correctly. Deleting the graph will
> probably cause a failure on this test unless "--all" is added.
> 

Actually both --all and --annotate-stdin disable the heuristic. However,
I think adding a test for both makes sense.

> Thanks,
> -Stolee
> 





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux