On 9/8/2022 1:59 PM, Junio C Hamano wrote: > Shaoxuan Yuan <shaoxuan.yuan02@xxxxxxxxx> writes: > >> + >> + /* >> + * NEEDSWORK: when reading a submodule, the sparsity settings in the >> + * superproject are incorrectly forgotten or misused. For example: >> + * >> + * 1. "command_requires_full_index" >> + * When this setting is turned on for `grep`, only the superproject >> + * knows it. All the submodules are read with their own configs >> + * and get prepare_repo_settings()'d. Therefore, these submodules >> + * "forget" the sparse-index feature switch. As a result, the index >> + * of these submodules are expanded unexpectedly. > > Is this fundamental, or is it just this version of the patch is > incomplete in that it still does not propagate the bit from > the_repository->settings to submodule's settings? Should a change > to propagate the bit be included for this topic to be complete? > > To put it another way, when grep with this version of the patch > recurses into a submodule, does it work correctly even without > flipping command_requires_full_index on in the "struct repository" > instance for the submodule? If so, then the NEEDSWORK above may be > just performance issue. If it behaves incorrectly, then it means > we cannot safely make "git grep" aware of sparse index yet. It is > hard to tell which one you meant in the above. > > I think the same question needs to be asked for other points > (omitted from quoting) in this list. I think this comment is misplaced. It should either be contained in the commit message or placed closer to this diff hunk: >> @@ -537,8 +561,20 @@ static int grep_cache(struct grep_opt *opt, >> >> strbuf_setlen(&name, name_base_len); >> strbuf_addstr(&name, ce->name); >> + if (S_ISSPARSEDIR(ce->ce_mode)) { >> + enum object_type type; >> + struct tree_desc tree; >> + void *data; >> + unsigned long size; >> + >> + data = read_object_file(&ce->oid, &type, &size); >> + init_tree_desc(&tree, data, size); >> >> - if (S_ISREG(ce->ce_mode) && >> + hit |= grep_tree(opt, pathspec, &tree, &name, 0, 0); >> + strbuf_setlen(&name, name_base_len); >> + strbuf_addstr(&name, ce->name); >> + free(data); >> + } else if (S_ISREG(ce->ce_mode) && The conclusion we were trying to reach is that you (Junio) correctly identified a bug in how we were calling grep_tree() in this hunk in its v4 form. HOWEVER: it "doesn't matter" because the sparse index doesn't work at all within a submodule. Specifically, if a super-repo does not enable sparse-checkout, but the submodule _does_, then we don't know how Git will behave currently. His reasonings go on to explain why the situation is fraught: * command_requires_full_index is set in a builtin only for the top-level project, so when we traverse into a submodule, we don't re-check if the current builtin has integrated with sparse index and expand a sparse index to a full one. * core_apply_sparse_checkout is a global not even associated with a repository struct. What happens when a super project is not sparse but a submodule is? Or vice-versa? I honestly don't know, and it will require testing to find out. Shaoxuan's comment is attempting to list the reasons why submodules do not currently work with sparse-index, and specifically that we can add tests that _should_ exercise this code in a meaningful way, but because of the current limitations of the codebase, the code isn't actually exercised in that scenario. In order to actually create a test that demonstrates how submodules and sparse-checkout work with this logic, we need to do some serious refactoring of the sparse-checkout logic to care about the repository struct, along with some other concerns specifically around the sparse index. This doesn't seem appropriate for the GSoC timeline or even for just this topic. Victoria and I have noted this issue down and will try to find time to investigate further, with a target of being able to actually exercise this grep_tree() call within a sparse index in a submodule, giving us full confidence that name_base_len is the correct value to put in that parameter. Thanks, -Stolee