Hi Matheus, On Wed, Jun 17, 2020 at 10:58 AM Matheus Tavares Bernardino <matheus.bernardino@xxxxxx> wrote: > > On Tue, Jun 16, 2020 at 10:21 PM Elijah Newren via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: > > > > From: Elijah Newren <newren@xxxxxxxxx> > > > > As noted in commit e7d7c73249 ("git-sparse-checkout: clarify > > interactions with submodules", 2020-06-10), sparse-checkout cannot > > remove submodules even if they don't match the sparsity patterns, > > because doing so would risk data loss -- unpushed, uncommitted, or > > untracked changes could all be lost. That commit also updated the > > documentation to point out that submodule initialization state was a > > parallel, orthogonal reason that entries in the index might not be > > present in the working tree. > > > > However, sparsity and submodule initialization weren't actually fully > > orthogonal yet. The SKIP_WORKTREE handling in unpack_trees would > > attempt to set the SKIP_WORKTREE bit on submodules when the submodule > > did not match the sparsity patterns. This resulted in innocuous but > > potentially alarming warning messages: > > > > warning: unable to rmdir 'sha1collisiondetection': Directory not empty > > > > It could also make things confusing since the entry would be marked as > > SKIP_WORKTREE in the index but actually still be present in the working > > tree: > > > > $ git ls-files -t | grep sha1collisiondetection > > S sha1collisiondetection > > $ ls -al sha1collisiondetection/ | wc -l > > 13 > > > > Submodules have always been their own form of "partial checkout" > > behavior, with their own "present or not" state determined by running > > "git submodule [init|deinit|update]". Enforce that separation by having > > the SKIP_WORKTREE logic not touch submodules and allow submodules to > > continue using their own initialization state for determining if the > > submodule is present. > > Makes sense to me. > > I'm just thinking about the possible implications in grep (with > mt/grep-sparse-checkout). As you mentioned in [1], users might think > of "git grep $rev $pat" as an optimized version of "git checkout $rev > && git grep $pat". And, in this sense, they probably expect the output > of these two operations to be equal. But if we don't set the > SKIP_WORKTREE bit for submodules they might diverge. > > As an example, if we have a repository like: > . > |-- A > | `-- sub > `-- B > > And the [cone-mode] sparsity rules: > /* > !/*/ > /B/ > > Then, "git grep --recurse-submodules $rev $pat" would search only in B > (as A doesn't match the sparsity patterns and thus, is not recursed > into). But "git checkout $rev && git grep --recurse-submodules $pat" > would search in both B and A/sub (as the latter would not have the > SKIP_WORKTREE bit set). > > This might be a problem for git-grep, not git-sparse-checkout. But I'm > not sure how we could solve it efficiently, as the submodule might be > deep down in a path whose first dir was already ignored for not > matching the sparsity patterns. Is this a problem we should consider, > or is it OK if the outputs of these two operations diverge? You always have an eagle eye for catching these things. Good point. So, ignoring submodules for a second... In the case of unmerged files that do not match the sparsity patterns, we previously felt it was fine that git grep $revision $pattern won't search through $revision:${currently_unmerged_file} despite the fact that git grep $pattern will search through ${currently_unmerged_file} from the working directory. If we follow that analogy, then we're fine having those two "almost command equivalents" diverge. Let's dig a little further anyway and compare these "almost command equivalents" a bit more, which as a reminder are (1) "git grep --recurse-submodules $rev $pat" vs. (2) "git checkout $rev && git grep --recurse-submodules $pat". (I just wanted to give them simple numerical labels.) In all cases, we'll be considering a $FILENAME which does not match sparsity patterns but is present in the working directory for reasons listed below: First case: $FILENAME has uncommitted changes: (Command 1) This will search only through files in $REV that match sparsity patterns; $REV:$FILENAME will be excluded (Command 2) If $FILENAME is different between HEAD and $REV this fails and shows nothing, even if other paths in the sparsity patterns had matches. Otherwise, it searches through $FILENAME. If the command is modified to add a -m to the `git checkout` command to make the checkout not fail, then the second command will always search through $REV:$FILENAME, as modified by the dirty changes being merged in. If the user cleans up the file with uncommitted changes first, then the checkout would make the file go away and thus it wouldn't be searched after the switch. Second case: $FILENAME has unmerged changes: (Command 1) This will search through files in $REV that match sparsity patterns; $REV:$FILENAME will be excluded. (Command 2) This will always fail and show nothing, regardless of any other matches. Here the checkout command always fails due to the unmerged entries and won't even allow the user to switch with -m. So, my comparison of commands #1 and #2 isn't quite right, even for non-submodule cases. But let's dig a little further. If someone were to try do change their sparsity patterns or even just run a "git sparse-checkout reapply" when they had the above issues, they'd see something like: $ git sparse-checkout reapply warning: The following paths are unmerged and were left despite sparse patterns: filename_with_conflicts After fixing the above paths, you may want to run `git sparse-checkout reapply`. This basically suggests that we consider uncommitted and unmerged files to be "unclean" in some way (sparse-checkout wants to set the SKIP_WORKTREE bit on all files that do not match the sparsity specification, so "clean" means sparse-checkout is able to do so). So I could amend my earlier comparison and say that IF the user has a clean directory, then "git grep --recurse-submodules $REVISION $PATTERN" should be equivalent to "git checkout $REVISION && git grep --recurse-submodules $PATTERN". I could also say that given the big warnings we give users when we can't set the SKIP_WORKTREE bit, that we expect it to be a transient state and thus that we expect them to more likely than not clear it out by the time they do switch branches. That would lead us to the follow-up rule that if the user does not have a clean directory then "git grep --recurse-submodules $REVISION $PATTERN" should be equivalent to what you would get if the unclean entries were ignored (expecting them to be cleaned before the any `git checkout` could be run) and you then otherwise ran "git checkout $REVISION && git grep --recurse-submodules $PATTERN". That suggests that grep's implementation we agreed on earlier is still correct (when given a $REVISION ignore submodulees that do not match the sparsity patterns), but that unpack-trees/sparse-checkout still need an update: When we notice an initialized submodule that does not match the sparsity patterns, we should print a warning just like we do for unmerged and dirty entries. So my patch needs a bit more work.