Re: [PATCH v3 8/8] sparse-checkout: clear tracked sparse dirs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 19, 2021 at 1:48 AM Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:
>
> Hi Stolee,
>
> On Tue, 17 Aug 2021, Derrick Stolee via GitGitGadget wrote:
>
> > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
> >
> > When changing the scope of a sparse-checkout using cone mode, we might
> > have some tracked directories go out of scope. The current logic removes
> > the tracked files from within those directories, but leaves the ignored
> > files within those directories. This is a bit unexpected to users who
> > have given input to Git saying they don't need those directories
> > anymore.
> >
> > This is something that is new to the cone mode pattern type: the user
> > has explicitly said "I want these directories and _not_ those
> > directories." The typical sparse-checkout patterns more generally apply
> > to "I want files with with these patterns" so it is natural to leave
> > ignored files as they are. This focus on directories in cone mode
> > provides us an opportunity to change the behavior.
> >
> > Leaving these ignored files in the sparse directories makes it
> > impossible to gain performance benefits in the sparse index. When we
> > track into these directories, we need to know if the files are ignored
> > or not, which might depend on the _tracked_ .gitignore file(s) within
> > the sparse directory. This depends on the indexed version of the file,
> > so the sparse directory must be expanded.
> >
> > By deleting the sparse directories when changing scope (or running 'git
> > sparse-checkout reapply') we regain these performance benefits as if the
> > repository was in a clean state.
> >
> > Since these ignored files are frequently build output or helper files
> > from IDEs, the users should not need the files now that the tracked
> > files are removed. If the tracked files reappear, then they will have
> > newer timestamps than the build artifacts, so the artifacts will need to
> > be regenerated anyway.
> >
> > Use the sparse-index as a data structure in order to find the sparse
> > directories that can be safely deleted. Re-expand the index to a full
> > one if it was full before.
>
> This description makes sense, and is easy to explain.
>
> It does not cover the case where untracked files are found and the
> directory is _not_ removed as a consequence, though. I would like to ask
> to add this to the commit message, because it is kind of important.
>
> The implementation of this behavior looks fine to me.
>
> About this behavior itself: in my experience, the more tricky a feature is
> to explain, the more likely the design should have been adjusted in the
> first place. And I find myself struggling a little bit to explain what
> files `git switch` touches in cone mode in addition to tracked files.

I share this concern.

> So I wonder whether an easier-to-explain behavior would be the following:
> ignored files in directories that fell out of the sparse-checkout cone are
> deleted. (Even if there are untracked, unignored files in the same
> directory tree.)
>
> This is different than what this patch implements: we would now have to
> delete the ignored and out-of-cone files _also_ when there are untracked
> files in the same directory, i.e. we could no longer use the sweet
> `remove_dir_recursively()` call. Therefore, the implementation of what I
> suggested would be much more complicated: you would have to enumerate the
> ignored files and remove them individually.

The ignored files are already enumerated by the fill_directory call;
you just need to iterate over the dir->ignored_nr entries in
dir->ignored.

> Having said that, even after mulling over this behavior and sleeping over
> it, I am unsure what the best way forward would be. Just because it is
> easy to explain does not make it right.
>
> It is tricky to decide mostly because "ignored" files are definitely not
> always build output. Apart from VIM's temporary files, users like me
> frequently write other files and/or directories that we simply do not want
> to see tracked in Git. For example, I often test things in an `a1.c` file
> that -- for convenience -- lives in the current worktree. Obviously I
> don't want Git to track it, but I also don't want it to be deleted, so I
> often add corresponding lines to `.git/info/exclude`. Likewise, I
> sometimes download additional information related to what I am
> implementing, and that also lives in the current worktree (but then, I
> usually am too lazy to add an entry to `.git/info/exclude` in those
> cases).

I do the same thing, and I know other users that do as well...but I
don't put such files in directories that are irrelevant to me.  I
create cruft files near other files that I'm working on, or in a
special directory of its own, but not under some directory that is
irrelevant to the areas I'm working on.

For reference, we implemented something like this in our `sparsify`
wrapper we have internally, where 'git clean -fdX <all sparse
directories>` is executed whenever folks sparsify.  (We have our own
tool and don't have users use sparse-checkout directly, because our
tool computes dependencies to determine which directories are needed.)
 I was really hesitant to add that cleaning behavior by default, and
just made it an option.  My colleagues tired of all the bug reports
about left-around directories and made it the default, waiting to hear
complaints.  We never got one.  It's been over a year.

> Now, I don't want to over-index on my own habits. There are so many users
> out there, and most of them have different preferences from mine.
>
> Which leaves me to wonder whether we need at least a flag to turn this
> behavior on and off? Something like
> `core.ignoredFilesInSparseConesArePrecious = true` (obviously with a
> better, shorter name).

That seems fine, but I suspect you're projecting the same concerns I
had, and it turns out much less useful than you'd expect (perhaps even
totally useless).



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux