Re: [GSOC][PATCH v2] diff-index: enable sparse index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Raghul Nanth A wrote:
> diff-index uses the run_diff_index() function to generate its diff. This
> function has been made sparse-index aware in the series that led to
> 8d2c3732 (Merge branch 'ld/sparse-diff-blame', 2021-12-21). Hence we can
> just set the requires-full-index to false for "diff-index".
> 
> Performance metrics
> 
>   Test                                                  HEAD~1            HEAD
>   ----------------------------------------------------------------------------------------------
>   2000.2: echo >>a && git diff-index HEAD (full-v3)     0.09(0.06+0.04)   0.09(0.07+0.03) +0.0%
>   2000.3: echo >>a && git diff-index HEAD (full-v4)     0.09(0.06+0.04)   0.09(0.05+0.05) +0.0%
>   2000.4: echo >>a && git diff-index HEAD (sparse-v3)   0.37(0.31+0.06)   0.01(0.02+0.03) -97.3%
>   2000.5: echo >>a && git diff-index HEAD (sparse-v4)   0.30(0.26+0.05)   0.01(0.01+0.04) -96.7%
>   2000.6: git diff-index HEAD **a (full-v3)             0.06(0.05+0.01)   0.06(0.06+0.01) +0.0%
>   2000.7: git diff-index HEAD **a (full-v4)             0.06(0.05+0.01)   0.06(0.04+0.02) +0.0%
>   2000.8: git diff-index HEAD **a (sparse-v3)           0.29(0.25+0.03)   0.01(0.01+0.00) -96.6%
>   2000.9: git diff-index HEAD **a (sparse-v4)           0.37(0.34+0.02)   0.01(0.01+0.00) -97.3%
>   2000.10: git diff-index --cached HEAD (full-v3)       0.05(0.03+0.01)   0.05(0.03+0.02) +0.0%
>   2000.11: git diff-index --cached HEAD (full-v4)       0.05(0.03+0.01)   0.05(0.02+0.02) +0.0%
>   2000.12: git diff-index --cached HEAD (sparse-v3)     0.35(0.33+0.01)   0.01(0.00+0.00) -97.1%
>   2000.13: git diff-index --cached HEAD (sparse-v4)     0.35(0.32+0.02)   0.01(0.00+0.00) -97.1%
> ---
> 
> Sorry for the late reply. Got caught up in school work
>   * Fixed commit message
>   * Added check to expand index if needed (based on diff-files)
>   * Added pathspec based tests

Please include the range-diff comparing the previous version to the new one
in your future iterations & patch series in general. GitGitGadget adds it by
default, but if you're using 'send-email' you should be able to use the
'--range-diff' option to generate it (see MyFirstContribution [1] for more
information).

[1] https://git-scm.com/docs/MyFirstContribution#v2-git-send-email

> 
>  builtin/diff-index.c                     |  9 +++++
>  t/perf/p2000-sparse-operations.sh        |  3 ++
>  t/t1092-sparse-checkout-compatibility.sh | 44 ++++++++++++++++++++++++
>  3 files changed, 56 insertions(+)
> 
> diff --git a/builtin/diff-index.c b/builtin/diff-index.c
> index 35dc9b23ee..e67cf5a1db 100644
> --- a/builtin/diff-index.c
> +++ b/builtin/diff-index.c
> @@ -24,6 +24,14 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
>  		usage(diff_cache_usage);
>  
>  	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
> +
> +	prepare_repo_settings(the_repository);
> +	the_repository->settings.command_requires_full_index = 0;
> +
> +	if (pathspec_needs_expanded_index(the_repository->index,
> +					  &rev.diffopt.pathspec))
> +		ensure_full_index(the_repository->index);

Re: my last review [2] - did you look into the behavior of 'diff' with
pathspecs and whether this 'pathspec_needs_expanded_index()' could be
centralized (in e.g. 'run_diff_index()')? What did you find?

[2] https://lore.kernel.org/git/91d3fd23-8120-db65-481a-e9f56017bb04@xxxxxxxxxx/

> +
>  	repo_init_revisions(the_repository, &rev, prefix);
>  	rev.abbrev = 0;
>  	prefix = precompose_argv_prefix(argc, argv, prefix);
> @@ -69,6 +77,7 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
>  		perror("repo_read_index");
>  		return -1;
>  	}
> +
>  	result = run_diff_index(&rev, option);
>  	result = diff_result_code(&rev.diffopt, result);
>  	release_revisions(&rev);
> diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
> index 3242cfe91a..62499d3aa8 100755
> --- a/t/perf/p2000-sparse-operations.sh
> +++ b/t/perf/p2000-sparse-operations.sh
> @@ -125,5 +125,8 @@ test_perf_on_all git checkout-index -f --all
>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
>  test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"
>  test_perf_on_all git grep --cached --sparse bogus -- "f2/f1/f1/*"
> +test_perf_on_all 'echo >>a && git diff-index HEAD'
> +test_perf_on_all git diff-index HEAD "**a"
> +test_perf_on_all git diff-index --cached HEAD
>  
>  test_done
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 801919009e..24bc716c48 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1996,6 +1996,50 @@ test_expect_success 'sparse index is not expanded: rm' '
>  	ensure_not_expanded rm -r deep
>  '
>  
> +test_expect_success 'sparse index is not expanded: diff-index' '
> +	init_repos &&
> +
> +	echo "new" >>sparse-index/g &&
> +	git -C sparse-index add g &&
> +	git -C sparse-index commit -m "dummy" &&
> +	ensure_not_expanded diff-index HEAD~1 &&
> +
> +	echo "text" >>sparse-index/deep/a &&
> +
> +	ensure_not_expanded diff-index HEAD deep/a &&
> +	ensure_not_expanded diff-index HEAD deep/*
> +'

nit: please add a newline here (after the 'sparse index is not expanded:
diff-index' test) to stay consistent with the other tests in the file.

> +test_expect_success 'diff-index pathspec expands index when necessary' '
> +	init_repos &&
> +
> +	echo "text" >>sparse-index/deep/a &&
> +
> +	# pathspec that should expand index
> +	! ensure_not_expanded diff-index "*/a" &&

Using '! ensure_not_expanded' will fail if the command expands the index
_or_ if the command fails altogether, which could inadvertently make these
tests pass even when there's a breakage in 'diff-index'. An
'ensure_expanded' function was created in [3] to test these types of cases;
you can use that here if you base your branch on 'sl/diff-files-sparse' (see
SubmittingPatches for more information [4]).

[3] https://lore.kernel.org/git/20230322161820.3609-3-cheskaqiqi@xxxxxxxxx/
[4] https://git-scm.com/docs/SubmittingPatches#base-branch

> +	! ensure_not_expanded diff-index "**a"

Git pathspec syntax [5] does not follow glob rules (without the ':(glob)'
prefix, at least), so the '**' doesn't do anything special here that a
single '*' wouldn't do. So, to make it clear that you aren't using glob
patterns, it might be better to use '*a' instead. 

Also, why are the wildcard pathspecs here in double-quotes, but the ones in
the previous test ('sparse index is not expanded: diff-index') aren't?

[5] https://git-scm.com/docs/gitglossary#Documentation/gitglossary.txt-aiddefpathspecapathspec

> +'
> +
> +test_expect_success 'diff-index with pathspec outside sparse definition' '
> +	init_repos &&
> +
> +	test_sparse_match test_must_fail git diff-index HEAD folder2/a
> +'
> +
> +test_expect_success 'match all: diff-index' '
> +	init_repos &&
> +
> +	test_all_match git diff-index HEAD &&
> +	write_script edit-contents <<-\EOF &&
> +	echo text >>$1
> +	EOF
> +	run_on_all ../edit-contents g &&
> +	run_on_all git add g &&
> +	run_on_all git commit -m "two" &&
> +	run_on_all rm g &&
> +	test_all_match git diff-index HEAD &&
> +	test_all_match git diff-index --cached HEAD~1
> +'
> +
>  test_expect_success 'grep with and --cached' '
>  	init_repos &&
>  




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux