Re: Bug report: `git restore --source --staged` deals poorly with sparse-checkout

Martin von Zweigbergk <martinvonz@xxxxxxxxxx> · Wed, 5 Oct 2022 13:00:00 -0700

On Wed, Oct 5, 2022 at 12:51 AM Elijah Newren <newren@xxxxxxxxx> wrote:
>
> On Tue, Oct 4, 2022 at 9:53 PM Martin von Zweigbergk
> <martinvonz@xxxxxxxxxx> wrote:
> >
> > On Tue, Oct 4, 2022 at 1:35 PM Victoria Dye <vdye@xxxxxxxxxx> wrote:
> > >
> > > Martin von Zweigbergk wrote:
> > > >> On Tue, Oct 4, 2022 at 9:34 AM Victoria Dye <vdye@xxxxxxxxxx <mailto:vdye@xxxxxxxxxx>> wrote:
> > > >>
> [...]
> > > >> Thanks for reporting this! There are a few confusing things going on with
> > > >> 'restore' here.
> > > >>
> > > >> First is that the out-of-cone was even restored in the first place.
> > > >>
> > > >
> > > > I was actually happy that the out-of-cone paths were restored. I ran that
> > > > command as an experiment while reading Elijah's doc because I was curious
> > > > what would happen. The reason I think it should restore out-of-cone paths is
> > > > so you can do `git restore --staged --source <some commit> && git commit -m
> > > > "restore to old commit"` without caring about the sparse spec.
> > >
> > > Conversely, that's behavior a user *wouldn't* want if they want to keep
> > > their sparse cone intact (not to mention the performance impact of checking
> > > out the entire worktree). I think it does more harm to those users than it
> > > would benefit the ones that want to checkout out-of-cone files.
> > >
> > > The use-case you're describing should be served by the
> > > '--ignore-skip-worktree-bits' option (not the most intuitive name,
> > > unfortunately). Luckily, there's an increasing desire to improve the naming
> > > of sparse-related options, so the UX situation should improve in the future.
> >
> > I realized after sending my previous email that I might have a
> > different view of what
> > sparse checkout is about. To me, it seems like it should be just a performance
> > optimization. That's why I feel like commands should behave the same way with
> > or without a sparse spec (unless that proposed `--restrict` flag is passed). I
> > understand if that's just not feasible. Sorry about the noise in that case :)
>
> The problem I see with that definition is that I'm not even sure what
> that means.  Behaving the same way with or without a sparse
> specification at the extreme means that switching branches should
> populate all the files in that branch...meaning a dense checkout.
> That kind of defeats the point.  So, I'm sure you don't mean that they
> behave the same in all cases...but where do you draw the line?

I agree with you and Stolee that there are two different cases: some
people use sparse checkouts to restrict what they see (behavior A), and
some people use it just as a performance optimization (behavior B). So I
suspect we roughly agree about what should happen if you pass
`--restrict` (or if that becomes the default so you don't actually need to
pass it). My arguments were about the `--no-restrict` case. Sorry, I
should have made that clear.

I also agree that having a way to make commands restrict to certain paths
by default is useful, and I agree that tying that set of paths to the current
worktree's sparse spec makes sense.

I'll answer the questions below for the `--no-restrict` case
(behavior B).

>
> Is it just switch/checkout & `reset --hard` that avoid reading and
> writing outside the sparse specification?
>
> Should diff & status ignore files outside the sparse specification
> even if users wrote to such files?  A "performance optimization" might
> suggest we should, but would users get confused?

I think they should be included (again, in the `--no-restrict` case).

>
> What about merge/rebase/cherry-pick/revert?  Should those write
> additional files to the working tree or avoid it?  What about if there
> are conflicts outside the sparse specification?

I think they should avoid it, but since the user will need to resolve
that conflict anyway, I can see it makes sense to write them to disk
if there are conflicts.

>
> And if extra files get written outside the sparse specification, are
> there ways for this to get "cleaned up" where after resolving
> conflicts or changes we can again remove the file from the working
> tree?

I've never really used `git sparse-checkout` (until I read your doc),
but isn't that what `git sparse-checkout reapply` is for?

>
> What about `git grep PATTERN`?  That's documented to search the
> tracked files in the working tree.  But should that include the files
> that would have been there were it not for the "performance
> optimization" of not checking them out?  (Similarly, what about `git
> grep --cached PATTERN` or `git grep PATTERN REVISION`?)  I mean, if
> these commands should behave the same regardless of sparse
> specification, then you should search those other files, right?  But
> isn't that a nasty performance penalty if the user has a
> sparse/partial clone since Git will have to do many network operations
> to get additional blobs in order to search them?  Is that really
> wanted?

I think it's consistent to search them with `--no-restrict` (but not
with `--restrict`, of course).

>
> What about `git rm '*.png'` to remove all the tracked png files from
> my working tree.  Should that also remove all the files that would
> have been there were it not for the "performance optimization"?  Will
> that result in very negative surprises for those with a "I want to
> concentrate on just this subset of files" mental model?

Same here.

>
> What about `git worktree add`?  Should the sparse specification be the
> same for all worktrees?  Be per-worktree?  Should it default to dense?
>  To just top-level sparse?  To the same sparsity as the worktree you
> were in when you created a new one?

That's an interesting case. If someone does `git worktree add` and expects
all files to be available in the working copy, they might be surprised, yes.
I think that's a much smaller risk than
`git restore --source HEAD^ --staged && git commit -m 'undo changes'` being
partial, however.