Re: [PATCH] unpack-trees: add core.sparseCheckoutRmFiles config

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/1/2021 2:31 PM, Tim Renouf wrote:
> When doing a checkout (or other index merge from a tree) causes the
> removal of a path that is outside sparse-checkout, the file is removed
> from the working tree, even if it is dirty.
> 
> That is probably what you want if the file got there by being
> materialized in a merge conflict. But it is not what you want if you
> deliberately put the file there.
> 
> This commit adds the above config item, defaulting to "true" to get the
> old behavior. Set it to "false" to avoid removing such a file from the
> worktree.

I don't think config is necessary here. This behavior is niche
enough to create a behavior-breaking change. However, we do want
to ensure that the full flow of eventually clearing the file when
clean is handled.

> diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
> index a0eeaeb02e..31adb38b1d 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -111,6 +111,17 @@ the sparse-checkout file.
>  To repopulate the working directory with all files, use the
>  `git sparse-checkout disable` command.
>  
> +The `core.sparseCheckoutRmFiles` config setting

If we _are_ going to go with a config option, then I'm not a big
fan of this name. I've also been thinking that the sparse-checkout
config has been squatting in the "core.*" space too long. Perhaps
it is time to create its own section?

What about something like sparseCheckout.keepDirtyFiles, which
defaults to false?

Alternatively, we could add things to the index.* space. We
already have "index.sparse" for the sparse index feature. For this
topic, a config option "index.keepDirtySparseFiles" would have a
similar meaning to my other suggestion.

At the least, you would need to update the appropriate file in
Documentation/config/*.txt.

>  extern int core_apply_sparse_checkout;
>  extern int core_sparse_checkout_cone;
> +extern int core_sparse_checkout_rm_files;

These previous variables being global is unfortunate and it
might be time to fix that. At minimum, I think this new
option might be able to inject somewhere else without
running a lot of git_config_get_value() calls in a loop.

Since your changes are within unpack-trees.c, maybe adding
a flag to 'struct unpack_trees_options' would suffice. It
could be initialized in unpack_trees() so only happen once
per index update.

> +test_expect_success 'core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout removes file from disk' '

This test name is too long. Perhaps

	'sparse-checkout removes dirty non-matching file'

> +	git config core.sparseCheckout false &&
> +	git checkout -f top &&
> +	echo added3 >added3 &&
> +	git add added3 &&
> +	git commit -madded3 &&
> +	git checkout top &&
> +	test_path_is_missing added3 &&
> +	git config core.sparseCheckout true &&
> +	git config core.sparseCheckoutRmFiles true &&
> +	echo init.t >.git/info/sparse-checkout &&

Perhaps we could use a more modern approach, such as

	git sparse-checkout init &&
	git sparse-checkout set init.t &&

(and use 'git sparse-checkout disable' at the start of the
test.)

> +	git checkout HEAD@{1} &&

I'd typically expect 'git checkout HEAD~1' instead of
using the reflog, since it is more robust to changing
the test in the future. Better even to create a new
branch earlier and then switch between named branches.

> +	test_path_is_missing added3 &&
> +	echo dirty >added3 &&

I appreciate that you confirm the file is missing before
you create it.

> +	git checkout top &&
> +	test_path_is_missing added3

Thought: does this happen also with 'git sparse-checkout
reapply'?

> +'
> +
> +test_expect_success '!core.sparseCheckoutRmFiles checkout that would remove file outside sparse-checkout does not remove file from disk' '

and here:

	'sparse-checkout keeps dirty non-matching file'

These tests are very similar. Perhaps they could be
grouped and just have a check at the end that makes
'added3' dirty and checks the behavior of 'git checkout'
with the two config settings?

> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1528,7 +1528,9 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>  
>  	/*
>  	 * 1. Pretend the narrowest worktree: only unmerged entries
> -	 * are checked out
> +	 * are checked out. If core.sparseCheckoutRmFiles is off, then
> +	 * treat a file being removed as merged, so it does not get
> +	 * removed from the worktree.
>  	 */
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
> @@ -1536,7 +1538,8 @@ static void mark_new_skip_worktree(struct pattern_list *pl,
>  		if (select_flag && !(ce->ce_flags & select_flag))
>  			continue;
>  
> -		if (!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED))
> +		if ((!ce_stage(ce) && !(ce->ce_flags & CE_CONFLICTED)) ||
> +		    ((ce->ce_flags & CE_REMOVE) && !core_sparse_checkout_rm_files))
>  			ce->ce_flags |= skip_wt_flag;
>  		else
>  			ce->ce_flags &= ~skip_wt_flag;
> @@ -1681,12 +1684,15 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
>  
>  	if (!o->skip_sparse_checkout) {
>  		/*
> -		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1
> +		 * Sparse checkout loop #2: set NEW_SKIP_WORKTREE on entries not in loop #1.
> +		 * If !core.sparseCheckoutRmFiles, include files being removed so ones
> +		 * outside sparse-checkout patterns do not get removed from the worktree.
>  		 * If they will have NEW_SKIP_WORKTREE, also set CE_SKIP_WORKTREE
>  		 * so apply_sparse_checkout() won't attempt to remove it from worktree
>  		 */
> +		int mask = core_sparse_checkout_rm_files ? CE_ADDED : CE_ADDED | CE_REMOVE;
>  		mark_new_skip_worktree(o->pl, &o->result,
> -				       CE_ADDED, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
> +				       mask, CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE,
>  				       o->verbose_update);

I'm a bit worried about this use of CE_REMOVE. I see its use listed in
cache-tree.c with this comment:

		/*
		 * CE_REMOVE entries are removed before the index is
		 * written to disk. Skip them to remain consistent
		 * with the future on-disk index.
		 */

This makes me think that the entries are actually not present in the
written index, which is incorrect. It will make it look like we have
staged a deletion of that file. Can you check that the output of
'git status' shows the file with no staged changes, but an unstaged
_modification_? Alternatively: the modification might be ignored since
it is a sparse entry, but we would probably want to fix that before
taking this change. If my understanding is correct*, then 'git status'
will show it as a staged deletion and an unstaged addition.

(*) This is a BIG IF.

Thank you for your interest here. Elijah is right that the area is
fraught with complexity, and I'm learning something new about it
every day as I work on my sparse-index feature. I'm still trying
to grasp the subtleties like this and their ramifications before
changing the existing behavior because I want to be _sure_ we are
moving in a more stable direction and not just another unstable
point that frustrates users.

This change seems like a focused attempt, but I think we need to
track down those other details before committing to such a change:

1. How does a user with a dirty, tracked, sparse file get back
   into a state where that file is deleted? What commands do
   they need to run? Can that be tested and added to the sparse-
   checkout documentation?

2. What does 'git status' look like when a user is in this state?
   Is that helpful?

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux