On 6/1/2021 2:31 PM, Tim Renouf wrote: > When doing a checkout (or other index merge from a tree) causes the > removal of a path that is outside sparse-checkout, the file is removed > from the working tree, even if it is dirty. > > That is probably what you want if the file got there by being > materialized in a merge conflict. But it is not what you want if you > deliberately put the file there. > > This commit adds the above config item, defaulting to "true" to get the > old behavior. Set it to "false" to avoid removing such a file from the > worktree. I don't think config is necessary here. This behavior is niche enough to create a behavior-breaking change. However, we do want to ensure that the full flow of eventually clearing the file when clean is handled. > diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt > index a0eeaeb02e..31adb38b1d 100644 > --- a/Documentation/git-sparse-checkout.txt > +++ b/Documentation/git-sparse-checkout.txt > @@ -111,6 +111,17 @@ the sparse-checkout file. > To repopulate the working directory with all files, use the > `git sparse-checkout disable` command. > > +The `core.sparseCheckoutRmFiles` config setting If we _are_ going to go with a config option, then I'm not a big fan of this name. I've also been thinking that the sparse-checkout config has been squatting in the "core.*" space too long. Perhaps it is time to create its own section? What about something like sparseCheckout.keepDirtyFiles, which defaults to false? Alternatively, we could add things to the index.* space. We already have "index.sparse" for the sparse index feature. For this topic, a config option "index.keepDirtySparseFiles" would have a similar meaning to my other suggestion. At the least, you would need to update the appropriate file in Documentation/config/*.txt. > extern int core_apply_sparse_checkout; > extern int core_sparse_checkout_cone; > +extern int core_sparse_checkout_rm_files; These previous variables being global is unfortunate and it might be time to fix that. At minimum, I think this new option might be able to inject somewhere else without running a lot of git_config_get_value() calls in a loop. Since your changes are within unpack-trees.c, maybe adding a flag to 'struct unpack_trees_options' would suffice. It could be initialized in unpack_trees() so only happen once per index update. > +test_expect_success 'core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout removes file from disk' ' This test name is too long. Perhaps 'sparse-checkout removes dirty non-matching file' > + git config core.sparseCheckout false && > + git checkout -f top && > + echo added3 >added3 && > + git add added3 && > + git commit -madded3 && > + git checkout top && > + test_path_is_missing added3 && > + git config core.sparseCheckout true && > + git config core.sparseCheckoutRmFiles true && > + echo init.t >.git/info/sparse-checkout && Perhaps we could use a more modern approach, such as git sparse-checkout init && git sparse-checkout set init.t && (and use 'git sparse-checkout disable' at the start of the test.) > + git checkout HEAD@{1} && I'd typically expect 'git checkout HEAD~1' instead of using the reflog, since it is more robust to changing the test in the future. Better even to create a new branch earlier and then switch between named branches. > + test_path_is_missing added3 && > + echo dirty >added3 && I appreciate that you confirm the file is missing before you create it. > + git checkout top && > + test_path_is_missing added3 Thought: does this happen also with 'git sparse-checkout reapply'? > +' > + > +test_expect_success '!core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout does not remove file from disk' ' and here: 'sparse-checkout keeps dirty non-matching file' These tests are very similar. Perhaps they could be grouped and just have a check at the end that makes 'added3' dirty and checks the behavior of 'git checkout' with the two config settings? > --- a/unpack-trees.c > +++ b/unpack-trees.c > @@ -1528,7 +1528,9 @@ static void mark_new_skip_worktree(struct pattern_list *pl, > > /* > * 1. Pretend the narrowest worktree: only unmerged entries > - * are checked out > + * are checked out. If core.sparseCheckoutRmFiles is off, then > + * treat a file being removed as merged, so it does not get > + * removed from the worktree. > */ > for (i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce = istate->cache[i]; > @@ -1536,7 +1538,8 @@ static void mark_new_skip_worktree(struct pattern_list *pl, > if (select_flag && !(ce->ce_flags & select_flag)) > continue; > > - if (!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED)) > + if ((!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED)) || > + ((ce->ce_flags & CE_REMOVE) && !core_sparse_checkout_rm_files)) > ce->ce_flags |= skip_wt_flag; > else > ce->ce_flags &= ~skip_wt_flag; > @@ -1681,12 +1684,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options > > if (!o->skip_sparse_checkout) { > /* > - * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1 > + * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1. > + * If !core.sparseCheckoutRmFiles, include files being removed so ones > + * outside sparse-checkout patterns do not get removed from the worktree. > * If they will have NEW_SKIP_WORKTREE, also set CE_SKIP_WORKTREE > * so apply_sparse_checkout() won't attempt to remove it from worktree > */ > + int mask = core_sparse_checkout_rm_files ? CE_ADDED : CE_ADDED | CE_REMOVE; > mark_new_skip_worktree(o->pl, &o->result, > - CE_ADDED, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE, > + mask, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE, > o->verbose_update); I'm a bit worried about this use of CE_REMOVE. I see its use listed in cache-tree.c with this comment: /* * CE_REMOVE entries are removed before the index is * written to disk. Skip them to remain consistent * with the future on-disk index. */ This makes me think that the entries are actually not present in the written index, which is incorrect. It will make it look like we have staged a deletion of that file. Can you check that the output of 'git status' shows the file with no staged changes, but an unstaged _modification_? Alternatively: the modification might be ignored since it is a sparse entry, but we would probably want to fix that before taking this change. If my understanding is correct*, then 'git status' will show it as a staged deletion and an unstaged addition. (*) This is a BIG IF. Thank you for your interest here. Elijah is right that the area is fraught with complexity, and I'm learning something new about it every day as I work on my sparse-index feature. I'm still trying to grasp the subtleties like this and their ramifications before changing the existing behavior because I want to be _sure_ we are moving in a more stable direction and not just another unstable point that frustrates users. This change seems like a focused attempt, but I think we need to track down those other details before committing to such a change: 1. How does a user with a dirty, tracked, sparse file get back into a state where that file is deleted? What commands do they need to run? Can that be tested and added to the sparse- checkout documentation? 2. What does 'git status' look like when a user is in this state? Is that helpful? Thanks, -Stolee