RE: [PATCH 1/1] reset: fix reset when using the sparse-checkout feature.

Kevin Willford <kewillf@xxxxxxxxxxxxx> · Tue, 12 Sep 2017 20:20:06 +0000

> From: Junio C Hamano [mailto:gitster@xxxxxxxxx]
> Sent: Monday, September 11, 2017 9:57 PM
> 
> Let's imagine a path P that is outside the sparse checkout area.
> And we check out a commit that has P.  P would still be recorded in
> the index but it would not materialize in the working tree.  "git
> status" and friends are asked not to treat this as "locally removed",
> to prevent "commit -a" from recording a removal, of course.
> 
> Now, let's further imagine that you have a checkout of the same
> project but at a commit that does not have P.  Then you reset to
> another commit that does have P.  My understanding of what Kevin's
> first test wants to demonstrate is that the index is populated with
> P (because you did reset to a commit with that path) but it does not
> materialize in the working tree (perhaps because that is outside the
> sparse checkout area?), yet there is something missing compared to
> the earlier case where "git status" and friends are asked not to
> treat P as "locally removed".  They instead show P as locally removed,
> and "commit -a" would record a removal---that is indeed a problem.
> 
> Am I reading the problem description correctly so far?  If so, then
> my answer to my first question (are we solving a right problem?) is
> yes.
> 

I think this is where I need to do a better job of explaining so here is a
simple example.

I have a file "a" that was added in the latest commit.  
$ git log --oneline
c1fa646 (HEAD -> reset, master) add file a
40b342c Initial commit with file i

Running the reset without using a sparse-checkout file

$ git reset HEAD~1
$ git status
On branch reset
Untracked files:
  (use "git add <file>..." to include in what will be committed)

        a

nothing added to commit but untracked files present (use "git add" to track)

Turning on sparse-checkout and running checkout to make my working
directory sparse

$ git config core.sparsecheckout true
$ echo /i > .git/info/sparse-checkout
$ git checkout -f

Running reset gives me 
$ git reset HEAD~1
$ git status
On branch reset
nothing to commit, working tree clean
$ git ls-files
i

file a is gone.  Not in the index and not in the working directory.
Nothing to let the user know that anything changed.

With a modified file no sparse-checkout
$ git log --oneline
6fbd34a (HEAD -> reset, modified) modified file m
c734d72 Initial commit with file i and m
$ git reset HEAD~1
Unstaged changes after reset:
M       m
$ git status
On branch reset
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   m

no changes added to commit (use "git add" and/or "git commit -a")

With sparse-checkout
$ git reset HEAD~1
Unstaged changes after reset:
D       m
$ git status
On branch reset
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        deleted:    m

no changes added to commit (use "git add" and/or "git commit -a")

I think we can agree that this is not the correct behavior.

> But this time, I am trying to see if the approach is good.  I am not
> so sure if the approach taken by this patch is so obviously good as
> you seem to think.  A logical consequence of the approach "git
> status thinks that P appears in the index and missing in the working
> tree is a local removal, so let's squelch it by creating the file in
> the working tree" is that we will end up fully populating the
> working tree with paths that are clearly outside the area the user
> designated as the sparse checkout area---surely that may squelch
> "status", but to me, it looks like it is breaking what the user
> wanted to achieve with "sparse checkout" in the first place.
> 

I don't think that we are trying to "squelch" status so much as make
it consistent with what the user would expect to happen.  If that means
not resetting entries with the skip-worktree bit or resetting the entries
but keeping the skip-worktree bit on, okay, but I would argue that is
not what the user wants because if you are now saying that sparse
means git will not change files outside the sparse-checkout entries,
what about merge, rebase, cherry-pick, apply?  Should those only
change the files that are in the sparse definition?  If so we would
be changing the commits from the original, i.e.  cherry-pick 123 would
create a different commit depending on whether or not you are using
sparse as well as a different commit depending on what is in your
sparse-checkout.  

I see reset being a similar scenario in that if everything is clean, after I
"reset HEAD~1" I should be able to run "add ." + "commit" and have
the same commit as before the reset.  If this is changed to only reset
the sparse entries, there will be staged changes after the reset because
HEAD has changed but we didn't update the index versions of the files.
If we do update the index with the "HEAD~1" version of the files and just
set the skip-worktree bit then the next commit will not have the original
changes outside the sparse-checkout for the commit.

> When we make a sparse checkout that places P outside the checked-out
> area, with P in the index and not in the working tree, what asks
> "git status" and friends not to treat P as "locally removed"?
> Shouldn't "reset HEAD~1" that resurrected P that was missing in the
> commit in the state before you did the "reset" be doing the same
> (e.g. flipping the bit in the index for path P that says "not having
> this in the working tree is not a removal---it is deliberately not
> checked out")?  If that is the right approach to solve the issue
> (which we established is a right problem to solve already), and the
> approach that the patch wants to take is quite the opposite of it,
> then my answer to the second question (are we solving the problem
> with the right approach?) is no.  And at that point, it is moot to
> ask if the code correctly and/or efficiently implements that wrong
> solution, isn't it?

I agree that we must determine if there is a problem, which I hope
we can agree that there is a valid problem to be addressed.  Determining
if I used the right approach depends on how you see reset working
correctly using the sparse feature.  Should it match a reset when not
using sparse or should it reset only sparse files?  I saw issues
if only the sparse files are the ones reset, which is the reason that
I went with this approach.

Thanks,
Kevin