Re: [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shaoxuan Yuan wrote:
> +
> +	/*
> +	 * NEEDSWORK: when reading a submodule, the sparsity settings in the
> +	 * superproject are incorrectly forgotten or misused. For example:
> +	 *
> +	 * 1. "command_requires_full_index"
> +	 * 	When this setting is turned on for `grep`, only the superproject
> +	 *	knows it. All the submodules are read with their own configs
> +	 *	and get prepare_repo_settings()'d. Therefore, these submodules
> +	 *	"forget" the sparse-index feature switch. As a result, the index
> +	 *	of these submodules are expanded unexpectedly.
> +	 *
> +	 * 2. "core_apply_sparse_checkout"
> +	 *	When running `grep` in the superproject, this setting is
> +	 *	populated using the superproject's configs. However, once
> +	 *	initialized, this config is globally accessible and is read by
> +	 *	prepare_repo_settings() for the submodules. For instance, if a
> +	 *	submodule is using a sparse-checkout, however, the superproject
> +	 *	is not, the result is that the config from the superproject will
> +	 *	dictate the behavior for the submodule, making it "forget" its
> +	 *	sparse-checkout state.
> +	 *
> +	 * 3. "core_sparse_checkout_cone"
> +	 *	ditto.

These are interesting observations, thank you for describing the behavior in
detail.

- #1 might seem like an easy fix - since 'command_requires_full_index' is
  tied to the command (not properties of the repo), the logical thing to do
  would be to propagate the value from the superproject to the subproject.
  However, that fix will undoubtedly expose lots of places where we're not
  handling the sparse index correctly in submodules. Since this isn't a
  problem introduced by your patch series, I'm content leaving this for a
  later series.
- #2 is an odd situation, but I'm guessing that the effect here will be
  minimal (since, regardless of the 'core_*' sparse-checkout globals,
  'SKIP_WORKTREE' will still be applied to - and respected on - entries in
  the index). It's more worrisome for commands that recurse submodules and
  *write* the index (e.g., 'git read-tree'), but that's also outside the
  scope of this series.

Given this information, I think your approach is (for the time being) a safe
one. Beyond the submodule issues, I'm happy with the rest of your
'grep_tree()' updates.

> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 63becc3138..fda05faadf 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1987,7 +2000,48 @@ test_expect_success 'grep is not expanded' '
>  
>  	# All files within the folder1/* pathspec are sparse,
>  	# so this command does not find any matches
> -	ensure_not_expanded ! grep a -- folder1/*
> +	ensure_not_expanded ! grep a -- folder1/* &&
> +
> +	# test out-of-cone pathspec with or without wildcard
> +	ensure_not_expanded grep --sparse --cached a -- "folder1/a" &&
> +	ensure_not_expanded grep --sparse --cached a -- "folder1/*" &&
> +
> +	# test in-cone pathspec with or without wildcard
> +	ensure_not_expanded grep --sparse --cached a -- "deep/a" &&
> +	ensure_not_expanded grep --sparse --cached a -- "deep/*"

Thanks for the new tests (re: [1])! 

[1] https://lore.kernel.org/git/4b65d7dc-e711-43a6-8763-62be79a3e4a9@xxxxxxxxxx/

> +'
> +
> +# NEEDSWORK: when running `grep` in the superproject with --recurse-submodules,
> +# Git expands the index of the submodules unexpectedly. Even though `grep`
> +# builtin is marked as "command_requires_full_index = 0", this config is only
> +# useful for the superproject. Namely, the submodules have their own configs,
> +# which are _not_ populated by the one-time sparse-index feature switch.
> +test_expect_failure 'grep within submodules is not expanded' '
> +	init_repos_as_submodules &&
> +
> +	# do not use ensure_not_expanded() here, becasue `grep` should be
> +	# run in the superproject, not in "./sparse-index"
> +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
> +	git grep --sparse --cached --recurse-submodules a -- "*/folder1/*" &&
> +	test_region ! index ensure_full_index trace2.txt
> +'

So this test is *only* demonstrating that the submodules' indexes are
expanded (incorrectly, hence the 'test_expect_failure'); it doesn't show
that 'git grep' returns the correct results...

> +
> +# NEEDSWORK: this test is not actually testing the code. The design purpose
> +# of this test is to verify the grep result when the submodules are using a
> +# sparse-index. Namely, we want "folder1/" as a tree (a sparse directory); but
> +# because of the index expansion, we are now grepping the "folder1/a" blob.
> +# Because of the problem stated above 'grep within submodules is not expanded',
> +# we don't have the ideal test environment yet.
> +test_expect_success 'grep sparse directory within submodules' '
> +	init_repos_as_submodules &&
> +
> +	cat >expect <<-\EOF &&
> +	full-checkout/folder1/a:a
> +	sparse-checkout/folder1/a:a
> +	sparse-index/folder1/a:a
> +	EOF
> +	git grep --sparse --cached --recurse-submodules a -- "*/folder1/*" >actual &&
> +	test_cmp actual expect
>  '

...but this test *does* show that those results are correct. I think it was
a good decision to keep the two separate, since only the index expansion
behavior is wrong (thus warranting the 'test_expect_failure'). The output of
'git grep' is still what we want it to be, so it gets a
'test_expect_success'.

>  
>  test_done




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux