Re: [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/8/2022 1:59 PM, Junio C Hamano wrote:
> Shaoxuan Yuan <shaoxuan.yuan02@xxxxxxxxx> writes:
> 
>> +
>> +	/*
>> +	 * NEEDSWORK: when reading a submodule, the sparsity settings in the
>> +	 * superproject are incorrectly forgotten or misused. For example:
>> +	 *
>> +	 * 1. "command_requires_full_index"
>> +	 * 	When this setting is turned on for `grep`, only the superproject
>> +	 *	knows it. All the submodules are read with their own configs
>> +	 *	and get prepare_repo_settings()'d. Therefore, these submodules
>> +	 *	"forget" the sparse-index feature switch. As a result, the index
>> +	 *	of these submodules are expanded unexpectedly.
> 
> Is this fundamental, or is it just this version of the patch is
> incomplete in that it still does not propagate the bit from
> the_repository->settings to submodule's settings?  Should a change
> to propagate the bit be included for this topic to be complete?
> 
> To put it another way, when grep with this version of the patch
> recurses into a submodule, does it work correctly even without
> flipping command_requires_full_index on in the "struct repository"
> instance for the submodule?  If so, then the NEEDSWORK above may be
> just performance issue.  If it behaves incorrectly, then it means
> we cannot safely make "git grep" aware of sparse index yet.  It is
> hard to tell which one you meant in the above.
> 
> I think the same question needs to be asked for other points
> (omitted from quoting) in this list.

I think this comment is misplaced. It should either be contained in
the commit message or placed closer to this diff hunk:

>> @@ -537,8 +561,20 @@ static int grep_cache(struct grep_opt *opt,
>>  
>>  		strbuf_setlen(&name, name_base_len);
>>  		strbuf_addstr(&name, ce->name);
>> +		if (S_ISSPARSEDIR(ce->ce_mode)) {
>> +			enum object_type type;
>> +			struct tree_desc tree;
>> +			void *data;
>> +			unsigned long size;
>> +
>> +			data = read_object_file(&ce->oid, &type, &size);
>> +			init_tree_desc(&tree, data, size);
>>  
>> -		if (S_ISREG(ce->ce_mode) &&
>> +			hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0);
>> +			strbuf_setlen(&name, name_base_len);
>> +			strbuf_addstr(&name, ce->name);
>> +			free(data);
>> +		} else if (S_ISREG(ce->ce_mode) &&

The conclusion we were trying to reach is that you (Junio) correctly
identified a bug in how we were calling grep_tree() in this hunk in
its v4 form.

HOWEVER: it "doesn't matter" because the sparse index doesn't work
at all within a submodule. Specifically, if a super-repo does not
enable sparse-checkout, but the submodule _does_, then we don't
know how Git will behave currently. His reasonings go on to explain
why the situation is fraught:

* command_requires_full_index is set in a builtin only for the
  top-level project, so when we traverse into a submodule, we don't
  re-check if the current builtin has integrated with sparse index
  and expand a sparse index to a full one.

* core_apply_sparse_checkout is a global not even associated with
  a repository struct. What happens when a super project is not
  sparse but a submodule is? Or vice-versa? I honestly don't know,
  and it will require testing to find out.

Shaoxuan's comment is attempting to list the reasons why submodules
do not currently work with sparse-index, and specifically that we
can add tests that _should_ exercise this code in a meaningful way,
but because of the current limitations of the codebase, the code
isn't actually exercised in that scenario.

In order to actually create a test that demonstrates how submodules
and sparse-checkout work with this logic, we need to do some serious
refactoring of the sparse-checkout logic to care about the repository
struct, along with some other concerns specifically around the sparse
index. This doesn't seem appropriate for the GSoC timeline or even for
just this topic.

Victoria and I have noted this issue down and will try to find time
to investigate further, with a target of being able to actually
exercise this grep_tree() call within a sparse index in a submodule,
giving us full confidence that name_base_len is the correct value to
put in that parameter.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux