Re: Bug report: `git restore --source --staged` deals poorly with sparse-checkout

Elijah Newren <newren@xxxxxxxxx> · Wed, 5 Oct 2022 00:51:43 -0700

On Tue, Oct 4, 2022 at 9:53 PM Martin von Zweigbergk
<martinvonz@xxxxxxxxxx> wrote:
>
> On Tue, Oct 4, 2022 at 1:35 PM Victoria Dye <vdye@xxxxxxxxxx> wrote:
> >
> > Martin von Zweigbergk wrote:
> > >> On Tue, Oct 4, 2022 at 9:34 AM Victoria Dye <vdye@xxxxxxxxxx <mailto:vdye@xxxxxxxxxx>> wrote:
> > >>
[...]
> > >> Thanks for reporting this! There are a few confusing things going on with
> > >> 'restore' here.
> > >>
> > >> First is that the out-of-cone was even restored in the first place.
> > >>
> > >
> > > I was actually happy that the out-of-cone paths were restored. I ran that
> > > command as an experiment while reading Elijah's doc because I was curious
> > > what would happen. The reason I think it should restore out-of-cone paths is
> > > so you can do `git restore --staged --source <some commit> && git commit -m
> > > "restore to old commit"` without caring about the sparse spec.
> >
> > Conversely, that's behavior a user *wouldn't* want if they want to keep
> > their sparse cone intact (not to mention the performance impact of checking
> > out the entire worktree). I think it does more harm to those users than it
> > would benefit the ones that want to checkout out-of-cone files.
> >
> > The use-case you're describing should be served by the
> > '--ignore-skip-worktree-bits' option (not the most intuitive name,
> > unfortunately). Luckily, there's an increasing desire to improve the naming
> > of sparse-related options, so the UX situation should improve in the future.
>
> I realized after sending my previous email that I might have a
> different view of what
> sparse checkout is about. To me, it seems like it should be just a performance
> optimization. That's why I feel like commands should behave the same way with
> or without a sparse spec (unless that proposed `--restrict` flag is passed). I
> understand if that's just not feasible. Sorry about the noise in that case :)

The problem I see with that definition is that I'm not even sure what
that means.  Behaving the same way with or without a sparse
specification at the extreme means that switching branches should
populate all the files in that branch...meaning a dense checkout.
That kind of defeats the point.  So, I'm sure you don't mean that they
behave the same in all cases...but where do you draw the line?

Is it just switch/checkout & `reset --hard` that avoid reading and
writing outside the sparse specification?

Should diff & status ignore files outside the sparse specification
even if users wrote to such files?  A "performance optimization" might
suggest we should, but would users get confused?

What about merge/rebase/cherry-pick/revert?  Should those write
additional files to the working tree or avoid it?  What about if there
are conflicts outside the sparse specification?

And if extra files get written outside the sparse specification, are
there ways for this to get "cleaned up" where after resolving
conflicts or changes we can again remove the file from the working
tree?

What about `git grep PATTERN`?  That's documented to search the
tracked files in the working tree.  But should that include the files
that would have been there were it not for the "performance
optimization" of not checking them out?  (Similarly, what about `git
grep --cached PATTERN` or `git grep PATTERN REVISION`?)  I mean, if
these commands should behave the same regardless of sparse
specification, then you should search those other files, right?  But
isn't that a nasty performance penalty if the user has a
sparse/partial clone since Git will have to do many network operations
to get additional blobs in order to search them?  Is that really
wanted?

What about `git rm '*.png'` to remove all the tracked png files from
my working tree.  Should that also remove all the files that would
have been there were it not for the "performance optimization"?  Will
that result in very negative surprises for those with a "I want to
concentrate on just this subset of files" mental model?

What about `git worktree add`?  Should the sparse specification be the
same for all worktrees?  Be per-worktree?  Should it default to dense?
 To just top-level sparse?  To the same sparsity as the worktree you
were in when you created a new one?

There are more questions along these lines, but perhaps this gives a
flavor of the kinds of questions that can come up which need
underlying user mental models and usecases to help us answer how these
commands should behave.

> > But suppose you're doing something like 'git restore --staged --no-worktree
> > --ignore-skip-worktree-bits --source HEAD^'. In that case:
> >
> > - you are restoring to the index
> > - you are *not* restoring to the worktree
> > - you're restoring files with SKIP_WORKTREE applied
> >
> > Right now, SKIP_WORKTREE is removed from the matched entries, but suppose
> > (per your comment) it wasn't. That wouldn't mean that we've "modified the
> > working copy"; the working tree is defined with respect to the index, and if
> > the index entry says "I don't care about the worktree", then there are no
> > differences to reflect.
>
> Yes, not technically changed. I was (and still am) thinking of the working copy
> as logically containing all the files even if some of them are not written out
> to disk.

Git started with a nearly identical definition and still sadly
includes it (though buried deep in a plumbing manual that thankfully
virtually no one will ever read):

       Skip-worktree bit can be defined in one (long) sentence: When reading
       an entry, if it is marked as skip-worktree, then Git pretends its
       working directory version is up to date and read the index version
       instead.

Such a mental model is useless to me in answering how commands should
behave in any interesting cases, and actually led to several
inconsistencies and bugs[1].  However, you didn't leave it there; you
took it a step further...

[1] https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@xxxxxxxxxxxxxx/

> I understand if that seems like an odd way of thinking about
> it. It might
> help to think about how it might appear in a virtual file system where clean
> files outside the sparse spec are presented from the index. Or it may be better
> to not try to think about it in this weird way at all :)

Now you've provided a real usecase (virtual file system really
pretending that all files are present), which gives us some context
with which we can try to answer those questions I asked above.  I
think some of the answers might be different for this usecase than the
"behavior A" and "behavior B" in my current sparse-checkout.txt
document, so maybe we want to add this usecase to the mix.

Anyway, thanks for the interesting bug report, and the context of
another usecase to consider.