On 6/4/2020 4:17 AM, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@xxxxxxxxx> > > sparse-checkout's purpose is to update the working tree to have it > reflect a subset of the tracked files. As such, it shouldn't be > switching branches, making commits, downloading or uploading data, or > staging or unstaging changes. Other than updating the worktree, the > only thing sparse-checkout should touch is the SKIP_WORKTREE bit of the > index. In particular, this sets up a nice invariant: running > sparse-checkout will never change the status of any file in `git status` > (reflecting the fact that we only set the SKIP_WORKTREE bit if the file > is safe to delete, i.e. if the file is unmodified). > > Traditionally, we did a _really_ bad job with this goal. The > predecessor to sparse-checkout involved manual editing of > .git/info/sparse-checkout and running `git read-tree -mu HEAD`. That > command would stage and unstage changes and overwrite dirty changes in > the working tree. > > The initial implementation of the sparse-checkout command was no better; > it simply invoked `git read-tree -mu HEAD` as a subprocess and had the > same caveats, though this issue came up repeatedly in review comments > and workarounds for the problems were put in place before the feature > was merged[1, 2, 3, 4, 5, 6; especially see 4 & 6]. > > [1] https://lore.kernel.org/git/CABPp-BFT9A5n=_bx5LsjCvbogqwSjiwgr5amcjgbU1iAk4KLJg@xxxxxxxxxxxxxx/ > [2] https://lore.kernel.org/git/CABPp-BEmwSwg4tgJg6nVG8a3Hpn_g-=ZjApZF4EiJO+qVgu4uw@xxxxxxxxxxxxxx/ > [3] https://lore.kernel.org/git/CABPp-BFV7TA0qwZCQpHCqx9N+JifyRyuBQ-pZ_oGfe-NOgyh7A@xxxxxxxxxxxxxx/ > [4] https://lore.kernel.org/git/CABPp-BHYCCD+Vx5fq35jH82eHc1-P53Lz_aGNpHJNcx9kg2K-A@xxxxxxxxxxxxxx/ > [5] https://lore.kernel.org/git/CABPp-BF+JWYZfDqp2Tn4AEKVp4b0YMA=Mbz4Nz62D-gGgiduYQ@xxxxxxxxxxxxxx/ > [6] https://lore.kernel.org/git/20191121163706.GV23183@xxxxxxxxxx/ > > However, these workarounds, in addition to disabling the feature in a > number of important cases, also missed one special case. I'll get back > to it later. > > In the 2.27.0 cycle, the disabling of the feature was lifted by finally > replacing the internal equivalent of `git read-tree -mu HEAD` with > something that did what we wanted: the new update_sparsity() function in > unpack-trees.c that only ever updates SKIP_WORKTREE bits in the index > and updates the working tree to match. This new function handles all > the cases that were problematic for the old implementation, except that > it breaks the same special case that avoided the workarounds of the old > implementation, but broke it in a different way. > > So...that brings us to the special case: a git clone performed with > --no-checkout. As per the meaning of the flag, --no-checkout does not > check out any branch, with the implication that you aren't on one and > need to switch to one after the clone. Implementationally, HEAD is > still set (so in some sense you are partially on a branch), but > * the index is "unborn" (non-existent) > * there are no files in the working tree (other than .git/) > * the next time git switch (or git checkout) is run it will run > unpack_trees with `initial_checkout` flag set to true. > It is not until you run, e.g. `git switch <somebranch>` that the index > will be written and files in the working tree populated. > > With this special --no-checkout case, the traditional `read-tree -mu > HEAD` behavior would have done the equivalent of acting like checkout -- > switch to the default branch (HEAD), write out an index that matches > HEAD, and update the working tree to match. This special case slipped > through the avoid-making-changes checks in the original sparse-checkout > command and thus continued there. > > After update_sparsity() was introduced and used (see commit f56f31af03 > ("sparse-checkout: use new update_sparsity() function", 2020-03-27)), > the behavior for the --no-checkout case changed: Due to git's > auto-vivification of an empty in-memory index (see do_read_index() and > note that `must_exist` is false), and due to sparse-checkout's > update_working_directory() code to always write out the index after it > was done, we got a new bug. That made it so that sparse-checkout would > switch the repository from a clone with an "unborn" index (i.e. still > needing an initial_checkout), to one that had a recorded index with no > entries. Thus, instead of all the files appearing deleted in `git > status` being known to git as a special artifact of not yet being on a > branch, our recording of an empty index made it suddenly look to git as > though it was definitely on a branch with ALL files staged for deletion! > A subsequent checkout or switch then had to contend with the fact that > it wasn't on an initial_checkout but had a bunch of staged deletions. Thank you for the detailed summary of how we got to this point and what is going on. When I asked you to look into this, I did ask for you to spend "less than an hour" only because I hoped the fix would be easy. Clearly, there are a lot of things going on here, but you seem to have a firm grasp on everything. > Make sure that sparse-checkout changes nothing in the index other than > the SKIP_WORKTREE bit; in particular, when the index is unborn we do not > have any branch checked out so there is no sparsification or > de-sparsification work to do. Simply return from > update_working_directory() early. This makes sense to me. The sparse-checkout file will still be modified, so when the user performs a checkout then Git will respect the set patterns. Your test confirms this, too! I think this behavior is a reasonable compromise from the previous behavior and being as correct as possible now. Before, the "git read-tree -mu HEAD" portion of "git sparse-chekcout set" would populate the working directory, and now it will not. This more closely matches how a user interacts with --no-checkout in the non-sparse case. > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > --- > sparse-checkout: avoid staging deletions of all files > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-801%2Fnewren%2Fsparse-checkout-and-unborn-index-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-801/newren/sparse-checkout-and-unborn-index-v1 > Pull-Request: https://github.com/git/git/pull/801 > > builtin/sparse-checkout.c | 4 ++++ > t/t1091-sparse-checkout-builtin.sh | 19 +++++++++++++++++++ > 2 files changed, 23 insertions(+) > > diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c > index 95d08824172..595463be68e 100644 > --- a/builtin/sparse-checkout.c > +++ b/builtin/sparse-checkout.c > @@ -99,6 +99,10 @@ static int update_working_directory(struct pattern_list *pl) > struct lock_file lock_file = LOCK_INIT; > struct repository *r = the_repository; > > + /* If no branch has been checked out, there are no updates to make. */ > + if (is_index_unborn(r->index)) > + return UPDATE_SPARSITY_SUCCESS; > + > memset(&o, 0, sizeof(o)); > o.verbose_update = isatty(2); > o.update = 1; > diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh > index 88cdde255cd..bc287e5c1fa 100755 > --- a/t/t1091-sparse-checkout-builtin.sh > +++ b/t/t1091-sparse-checkout-builtin.sh > @@ -100,6 +100,25 @@ test_expect_success 'clone --sparse' ' > check_files clone a > ' > > +test_expect_success 'interaction with clone --no-checkout (unborn index)' ' > + git clone --no-checkout "file://$(pwd)/repo" clone_no_checkout && > + git -C clone_no_checkout sparse-checkout init --cone && > + git -C clone_no_checkout sparse-checkout set folder1 && > + git -C clone_no_checkout sparse-checkout list >actual && > + cat >expect <<-\EOF && > + folder1 > + EOF > + test_cmp expect actual && > + ls clone_no_checkout >actual && > + test_must_be_empty actual && My only comment on the test case is to see if you could use the "check_files" macro instead of "ls". See 761e3d26 (sparse-checkout: improve OS ls compatibility, 2019-12-20) for details. > + test_path_is_missing clone_no_checkout/.git/index && > + > + # No branch is checked out until we manually switch to one > + git -C clone_no_checkout switch master && > + test_path_is_file clone_no_checkout/.git/index && > + check_files clone_no_checkout a folder1 > +' Thanks! -Stolee