On Thu, Aug 19, 2021 at 1:48 AM Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > > Hi Stolee, > > On Tue, 17 Aug 2021, Derrick Stolee via GitGitGadget wrote: > > > From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> > > > > When changing the scope of a sparse-checkout using cone mode, we might > > have some tracked directories go out of scope. The current logic removes > > the tracked files from within those directories, but leaves the ignored > > files within those directories. This is a bit unexpected to users who > > have given input to Git saying they don't need those directories > > anymore. > > > > This is something that is new to the cone mode pattern type: the user > > has explicitly said "I want these directories and _not_ those > > directories." The typical sparse-checkout patterns more generally apply > > to "I want files with with these patterns" so it is natural to leave > > ignored files as they are. This focus on directories in cone mode > > provides us an opportunity to change the behavior. > > > > Leaving these ignored files in the sparse directories makes it > > impossible to gain performance benefits in the sparse index. When we > > track into these directories, we need to know if the files are ignored > > or not, which might depend on the _tracked_ .gitignore file(s) within > > the sparse directory. This depends on the indexed version of the file, > > so the sparse directory must be expanded. > > > > By deleting the sparse directories when changing scope (or running 'git > > sparse-checkout reapply') we regain these performance benefits as if the > > repository was in a clean state. > > > > Since these ignored files are frequently build output or helper files > > from IDEs, the users should not need the files now that the tracked > > files are removed. If the tracked files reappear, then they will have > > newer timestamps than the build artifacts, so the artifacts will need to > > be regenerated anyway. > > > > Use the sparse-index as a data structure in order to find the sparse > > directories that can be safely deleted. Re-expand the index to a full > > one if it was full before. > > This description makes sense, and is easy to explain. > > It does not cover the case where untracked files are found and the > directory is _not_ removed as a consequence, though. I would like to ask > to add this to the commit message, because it is kind of important. > > The implementation of this behavior looks fine to me. > > About this behavior itself: in my experience, the more tricky a feature is > to explain, the more likely the design should have been adjusted in the > first place. And I find myself struggling a little bit to explain what > files `git switch` touches in cone mode in addition to tracked files. I share this concern. > So I wonder whether an easier-to-explain behavior would be the following: > ignored files in directories that fell out of the sparse-checkout cone are > deleted. (Even if there are untracked, unignored files in the same > directory tree.) > > This is different than what this patch implements: we would now have to > delete the ignored and out-of-cone files _also_ when there are untracked > files in the same directory, i.e. we could no longer use the sweet > `remove_dir_recursively()` call. Therefore, the implementation of what I > suggested would be much more complicated: you would have to enumerate the > ignored files and remove them individually. The ignored files are already enumerated by the fill_directory call; you just need to iterate over the dir->ignored_nr entries in dir->ignored. > Having said that, even after mulling over this behavior and sleeping over > it, I am unsure what the best way forward would be. Just because it is > easy to explain does not make it right. > > It is tricky to decide mostly because "ignored" files are definitely not > always build output. Apart from VIM's temporary files, users like me > frequently write other files and/or directories that we simply do not want > to see tracked in Git. For example, I often test things in an `a1.c` file > that -- for convenience -- lives in the current worktree. Obviously I > don't want Git to track it, but I also don't want it to be deleted, so I > often add corresponding lines to `.git/info/exclude`. Likewise, I > sometimes download additional information related to what I am > implementing, and that also lives in the current worktree (but then, I > usually am too lazy to add an entry to `.git/info/exclude` in those > cases). I do the same thing, and I know other users that do as well...but I don't put such files in directories that are irrelevant to me. I create cruft files near other files that I'm working on, or in a special directory of its own, but not under some directory that is irrelevant to the areas I'm working on. For reference, we implemented something like this in our `sparsify` wrapper we have internally, where 'git clean -fdX <all sparse directories>` is executed whenever folks sparsify. (We have our own tool and don't have users use sparse-checkout directly, because our tool computes dependencies to determine which directories are needed.) I was really hesitant to add that cleaning behavior by default, and just made it an option. My colleagues tired of all the bug reports about left-around directories and made it the default, waiting to hear complaints. We never got one. It's been over a year. > Now, I don't want to over-index on my own habits. There are so many users > out there, and most of them have different preferences from mine. > > Which leaves me to wonder whether we need at least a flag to turn this > behavior on and off? Something like > `core.ignoredFilesInSparseConesArePrecious = true` (obviously with a > better, shorter name). That seems fine, but I suspect you're projecting the same concerns I had, and it turns out much less useful than you'd expect (perhaps even totally useless).