Re: [PATCH] sparse-checkout: avoid staging deletions of all files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/4/2020 4:17 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@xxxxxxxxx>
> 
> sparse-checkout's purpose is to update the working tree to have it
> reflect a subset of the tracked files.  As such, it shouldn't be
> switching branches, making commits, downloading or uploading data, or
> staging or unstaging changes.  Other than updating the worktree, the
> only thing sparse-checkout should touch is the SKIP_WORKTREE bit of the
> index.  In particular, this sets up a nice invariant: running
> sparse-checkout will never change the status of any file in `git status`
> (reflecting the fact that we only set the SKIP_WORKTREE bit if the file
> is safe to delete, i.e. if the file is unmodified).
> 
> Traditionally, we did a _really_ bad job with this goal.  The
> predecessor to sparse-checkout involved manual editing of
> .git/info/sparse-checkout and running `git read-tree -mu HEAD`.  That
> command would stage and unstage changes and overwrite dirty changes in
> the working tree.
> 
> The initial implementation of the sparse-checkout command was no better;
> it simply invoked `git read-tree -mu HEAD` as a subprocess and had the
> same caveats, though this issue came up repeatedly in review comments
> and workarounds for the problems were put in place before the feature
> was merged[1, 2, 3, 4, 5, 6; especially see 4 & 6].
> 
> [1] https://lore.kernel.org/git/CABPp-BFT9A5n=_bx5LsjCvbogqwSjiwgr5amcjgbU1iAk4KLJg@xxxxxxxxxxxxxx/
> [2] https://lore.kernel.org/git/CABPp-BEmwSwg4tgJg6nVG8a3Hpn_g-=ZjApZF4EiJO+qVgu4uw@xxxxxxxxxxxxxx/
> [3] https://lore.kernel.org/git/CABPp-BFV7TA0qwZCQpHCqx9N+JifyRyuBQ-pZ_oGfe-NOgyh7A@xxxxxxxxxxxxxx/
> [4] https://lore.kernel.org/git/CABPp-BHYCCD+Vx5fq35jH82eHc1-P53Lz_aGNpHJNcx9kg2K-A@xxxxxxxxxxxxxx/
> [5] https://lore.kernel.org/git/CABPp-BF+JWYZfDqp2Tn4AEKVp4b0YMA=Mbz4Nz62D-gGgiduYQ@xxxxxxxxxxxxxx/
> [6] https://lore.kernel.org/git/20191121163706.GV23183@xxxxxxxxxx/
> 
> However, these workarounds, in addition to disabling the feature in a
> number of important cases, also missed one special case.  I'll get back
> to it later.
> 
> In the 2.27.0 cycle, the disabling of the feature was lifted by finally
> replacing the internal equivalent of `git read-tree -mu HEAD` with
> something that did what we wanted: the new update_sparsity() function in
> unpack-trees.c that only ever updates SKIP_WORKTREE bits in the index
> and updates the working tree to match.  This new function handles all
> the cases that were problematic for the old implementation, except that
> it breaks the same special case that avoided the workarounds of the old
> implementation, but broke it in a different way.
> 
> So...that brings us to the special case: a git clone performed with
> --no-checkout.  As per the meaning of the flag, --no-checkout does not
> check out any branch, with the implication that you aren't on one and
> need to switch to one after the clone.  Implementationally, HEAD is
> still set (so in some sense you are partially on a branch), but
>   * the index is "unborn" (non-existent)
>   * there are no files in the working tree (other than .git/)
>   * the next time git switch (or git checkout) is run it will run
>     unpack_trees with `initial_checkout` flag set to true.
> It is not until you run, e.g. `git switch <somebranch>` that the index
> will be written and files in the working tree populated.
> 
> With this special --no-checkout case, the traditional `read-tree -mu
> HEAD` behavior would have done the equivalent of acting like checkout --
> switch to the default branch (HEAD), write out an index that matches
> HEAD, and update the working tree to match.  This special case slipped
> through the avoid-making-changes checks in the original sparse-checkout
> command and thus continued there.
> 
> After update_sparsity() was introduced and used (see commit f56f31af03
> ("sparse-checkout: use new update_sparsity() function", 2020-03-27)),
> the behavior for the --no-checkout case changed:  Due to git's
> auto-vivification of an empty in-memory index (see do_read_index() and
> note that `must_exist` is false), and due to sparse-checkout's
> update_working_directory() code to always write out the index after it
> was done, we got a new bug.  That made it so that sparse-checkout would
> switch the repository from a clone with an "unborn" index (i.e. still
> needing an initial_checkout), to one that had a recorded index with no
> entries.  Thus, instead of all the files appearing deleted in `git
> status` being known to git as a special artifact of not yet being on a
> branch, our recording of an empty index made it suddenly look to git as
> though it was definitely on a branch with ALL files staged for deletion!
> A subsequent checkout or switch then had to contend with the fact that
> it wasn't on an initial_checkout but had a bunch of staged deletions.
 
Thank you for the detailed summary of how we got to this point and
what is going on. When I asked you to look into this, I did ask for
you to spend "less than an hour" only because I hoped the fix would
be easy. Clearly, there are a lot of things going on here, but you
seem to have a firm grasp on everything.

> Make sure that sparse-checkout changes nothing in the index other than
> the SKIP_WORKTREE bit; in particular, when the index is unborn we do not
> have any branch checked out so there is no sparsification or
> de-sparsification work to do.  Simply return from
> update_working_directory() early.

This makes sense to me. The sparse-checkout file will still be
modified, so when the user performs a checkout then Git will
respect the set patterns. Your test confirms this, too!

I think this behavior is a reasonable compromise from the
previous behavior and being as correct as possible now. Before,
the "git read-tree -mu HEAD" portion of "git sparse-chekcout set"
would populate the working directory, and now it will not.
This more closely matches how a user interacts with --no-checkout
in the non-sparse case.

> Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
> ---
>     sparse-checkout: avoid staging deletions of all files
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-801%2Fnewren%2Fsparse-checkout-and-unborn-index-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-801/newren/sparse-checkout-and-unborn-index-v1
> Pull-Request: https://github.com/git/git/pull/801
> 
>  builtin/sparse-checkout.c          |  4 ++++
>  t/t1091-sparse-checkout-builtin.sh | 19 +++++++++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 95d08824172..595463be68e 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -99,6 +99,10 @@ static int update_working_directory(struct pattern_list *pl)
>  	struct lock_file lock_file = LOCK_INIT;
>  	struct repository *r = the_repository;
>  
> +	/* If no branch has been checked out, there are no updates to make. */
> +	if (is_index_unborn(r->index))
> +		return UPDATE_SPARSITY_SUCCESS;
> +
>  	memset(&o, 0, sizeof(o));
>  	o.verbose_update = isatty(2);
>  	o.update = 1;
> diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh
> index 88cdde255cd..bc287e5c1fa 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -100,6 +100,25 @@ test_expect_success 'clone --sparse' '
>  	check_files clone a
>  '
>  
> +test_expect_success 'interaction with clone --no-checkout (unborn index)' '
> +	git clone --no-checkout "file://$(pwd)/repo" clone_no_checkout &&
> +	git -C clone_no_checkout sparse-checkout init --cone &&
> +	git -C clone_no_checkout sparse-checkout set folder1 &&
> +	git -C clone_no_checkout sparse-checkout list >actual &&
> +	cat >expect <<-\EOF &&
> +	folder1
> +	EOF
> +	test_cmp expect actual &&
> +	ls clone_no_checkout >actual &&
> +	test_must_be_empty actual &&

My only comment on the test case is to see if you could use
the "check_files" macro instead of "ls". See 761e3d26
(sparse-checkout: improve OS ls compatibility, 2019-12-20)
for details.

> +	test_path_is_missing clone_no_checkout/.git/index &&
> +
> +	# No branch is checked out until we manually switch to one
> +	git -C clone_no_checkout switch master &&
> +	test_path_is_file clone_no_checkout/.git/index &&
> +	check_files clone_no_checkout a folder1
> +'

Thanks!
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux