Re: [PATCH v3 8/8] sparse-checkout: clear tracked sparse dirs

Elijah Newren <newren@xxxxxxxxx> · Fri, 20 Aug 2021 09:15:47 -0700

On Fri, Aug 20, 2021 at 8:50 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:
>
> On 8/19/2021 4:48 AM, Johannes Schindelin wrote:
> > On Tue, 17 Aug 2021, Derrick Stolee via GitGitGadget wrote:
> > This description makes sense, and is easy to explain.
> >
> > It does not cover the case where untracked files are found and the
> > directory is _not_ removed as a consequence, though. I would like to ask
> > to add this to the commit message, because it is kind of important.
>
> Right. I should have modified the message from my earlier version when
> the issues with untracked files came up.

:+1:

> > The implementation of this behavior looks fine to me.
> >
> > About this behavior itself: in my experience, the more tricky a feature is
> > to explain, the more likely the design should have been adjusted in the
> > first place. And I find myself struggling a little bit to explain what
> > files `git switch` touches in cone mode in addition to tracked files.
>
> Keep in mind that 'git switch' does not change the sparse-checkout cone,
> and the activity being described is something that would happen within
> 'git sparse-checkout set' or 'git sparse-checkout reapply'.

Good points.

> > So I wonder whether an easier-to-explain behavior would be the following:
> > ignored files in directories that fell out of the sparse-checkout cone are
> > deleted. (Even if there are untracked, unignored files in the same
> > directory tree.)
> >
> > This is different than what this patch implements: we would now have to
> > delete the ignored and out-of-cone files _also_ when there are untracked
> > files in the same directory, i.e. we could no longer use the sweet
> > `remove_dir_recursively()` call. Therefore, the implementation of what I
> > suggested would be much more complicated: you would have to enumerate the
> > ignored files and remove them individually.
>
> Outside of "it's harder to write that feature"

I don't think it is; see my other email.

> perhaps I could convince
> you that it is better to do nothing in the presence of untracked files.
>
> If a user has an untracked file within a directory that is leaving the
> sparse cone, then that means they were doing something in that space and
> perhaps has unfinished work. By leaving the files on-disk, they have an
> opportunity to revert the change to the sparse-checkout cone and continue
> their work interrupted. This includes keeping things like build artifacts
> (that are ignored) so they can incrementally build from that position.
>
> The general thought I have here is: having untracked, not-ignored files
> in a directory that is leaving the sparse-checkout cone is an unexpected
> behavior, so we should do as little as possible in that scenario.
>
> It also makes it more clear to the user what happened: "You had untracked
> files here, so we left them alone" is easier to understand than "You had
> untracked files here, so we deleted the ones that were ignored and kept
> the rest."

This explanation seems reasonable, but it certainly belongs in the
commit message.

However, doesn't it defeat the point of your removing the ignored
files?  Not only do you have directories full of files, but you won't
be able to have sparse directory entries due to the need to have
.gitignore files from underneath them.

Perhaps this is okay because we expect this to be an unusual case.
But, if so, do we want a more verbose warning (not with every
directory that fails to be deleted, but just at the end if there were
any), suggesting to the user that they clean up the relevant
directories and do a sparse-checkout reapply?  And, ultimately, once
the directory is gone, is that good enough to allow us to keep our
indexes sparse and fast, or are we looking at a huge amount of effort
spent on .gitignore files underneath sparse directories?

> > Having said that, even after mulling over this behavior and sleeping over
> > it, I am unsure what the best way forward would be. Just because it is
> > easy to explain does not make it right.
>
> Of course, you already have a retort to my claim that "simpler is better",
> but I'll just focus on the point that "simpler for the user to understand"
> is a different point than "simpler to implement".

I'm confused now.  Which behavior are you arguing is simpler for the
user to understand?