Re: [PATCH] describe: enable sparse index for describe

Derrick Stolee <derrickstolee@xxxxxxxxxx> · Tue, 28 Mar 2023 15:46:38 -0400

On 3/27/23 2:26 PM, Junio C Hamano wrote:
> "Raghul Nanth A via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
>>  builtin/describe.c                       |   2 +
>>  t/perf/p2000-sparse-operations.sh        |  14 +-
>>  t/t1092-sparse-checkout-compatibility.sh |  10 +
>>  t/t6121-describe-sparse.sh               | 675 +++++++++++++++++++++++
>>  4 files changed, 697 insertions(+), 4 deletions(-)
>>  create mode 100755 t/t6121-describe-sparse.sh
> 
> This copying of a file with 600+ lines only to touch up a handful
> lines (like a 20+ lines patch) is almost criminal.  Imagine the
> effort to keep them in sync over time, when "describe" itself may
> learn new features and improved output, independent from the
> sparse-index compatibility.
> 
> Can't we do better than this with a bit of refactoring?
> 
>> diff --git a/builtin/describe.c b/builtin/describe.c
>> index 5b5930f5c8c..7ff9b5e4b20 100644
>> --- a/builtin/describe.c
>> +++ b/builtin/describe.c
>> @@ -654,6 +654,8 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
>>  			int fd, result;
>>  
>>  			setup_work_tree();
>> +			prepare_repo_settings(the_repository);
>> +			the_repository->settings.command_requires_full_index = 0;
> 
> Offhand, the only case I know that "describe" even _needs_ a working
> tree or the index is when asked to do the "--dirty" thing.  To
> figure out if the working tree files are modified, the code calls
> into run_diff_index(), but has that codepath been made sparse-index
> aware already?

It seems that this is a case where we can rely on the existing
changes around run_diff_index(), which is nice. We get a very
easy win for a narrow case.

And I agree about the test case situation. It would suffice to
show some checks that the result is the same across all cases
in t1092 for 'git describe --dirty'. Those should be the only
new correctness tests necessary for this change.

>> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh

>> @@ -86,7 +89,8 @@ test_expect_success 'setup repo and indexes' '
>>  		git sparse-checkout set $SPARSE_CONE &&
>>  		git config index.version 4 &&
>>  		git update-index --index-version=4 &&
>> -		git checkout HEAD~4
>> +		git checkout HEAD~4 &&
>> +		git tag -a v1.0 -m "Final"
>>  	)
>>  '
> 
> It is unclear from the proposed commit log what the relevance of
> adding a step to create an annotated tag to these tests.  It is not
> like any later step uses that tag to figure out anything.  There may
> be good reasons to add these tags (otherwise you would not be adding
> them to these tests), but please explain why in the proposed log
> message so that future readers of the "git log -p" do not have to
> ask this question.

I imagine that 'git describe' reports something better when a tag
is reachable from HEAD. Would be good to make that clear.

Indeed, when removing these lines and running the test on a repo
without any tags, the test fails with this message:

  fatal: No names found, cannot describe anything.

These tags could be added earlier in the test, in one step:

diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index 3242cfe91a0..ba13317c942 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -45,6 +45,7 @@ test_expect_success 'setup repo and indexes' '
 	git sparse-checkout init --cone &&
 	git sparse-checkout set $SPARSE_CONE &&
 	git checkout -b wide $OLD_COMMIT &&
+	git tag -a v1.0 -m "final" &&
 
 	for l2 in f1 f2 f3 f4
 	do

The tests then run on the four examples cloned from this copy.

>> @@ -125,5 +129,7 @@ test_perf_on_all git checkout-index -f --all
>>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>>  test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"
>>  test_perf_on_all git grep --cached --sparse bogus -- "f2/f1/f1/*"
>> +test_perf_on_all git describe --dirty
>> +test_perf_on_all 'echo >> new && git describe --dirty'
>>  
>>  test_done
> 
> Just like '>', '>>' is a rediraction operator and should have SP
> before it (you got it right) and no SP between it and its operand.
> I.e.
> 
> 	echo >>new && git describe --dirty
> 
> You have the same in t1092, I think.

Also, since you are adding these performance tests, it would be
nice to see their results in the commit message. You can get
values without and with this change using (from t/perf/):

GIT_PERF_REPEAT_COUNT=10 ./run HEAD~1 HEAD -- p2000-sparse-operations.sh

For example, I ran this on my machine (after deleting the other tests
so it ran faster) and got these results:

Test                                                      HEAD~1            HEAD                  
--------------------------------------------------------------------------------------------------
2000.2: git describe --dirty (full-v3)                    0.36(0.07+0.32)   0.45(0.08+0.37) +25.0%
2000.3: git describe --dirty (full-v4)                    0.39(0.08+0.32)   0.42(0.08+0.35) +7.7% 
2000.4: git describe --dirty (sparse-v3)                  1.49(0.91+0.58)   0.33(0.04+0.59) -77.9%
2000.5: git describe --dirty (sparse-v4)                  1.48(0.92+0.57)   0.34(0.04+0.60) -77.0%
2000.6: echo >> new && git describe --dirty (full-v3)     0.37(0.07+0.32)   0.44(0.08+0.36) +18.9%
2000.7: echo >> new && git describe --dirty (full-v4)     0.40(0.08+0.32)   0.42(0.08+0.36) +5.0% 
2000.8: echo >> new && git describe --dirty (sparse-v3)   1.59(0.97+0.62)   0.33(0.04+0.57) -79.2%
2000.9: echo >> new && git describe --dirty (sparse-v4)   1.64(0.98+0.64)   0.31(0.03+0.54) -81.1%

Thanks,
-Stolee