On Thu, Jul 29, 2021 at 11:27 AM Derrick Stolee via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > My commits are technically based on ds/add-with-sparse-index, but could > apply earlier, I believe. > > We launched an experimental release [1] of the sparse-index feature to our > internal users. We immediately discovered a problem due to the isolated way > in which we tested the sparse index: we were never building the project and > changing our sparse-checkout definition. > > [1] https://github.com/microsoft/git/releases/tag/v2.32.0.vfs.0.102.exp > > Users who ran a build in one project and then moved to another still had > build artifacts in their worktree that lived inside the old directories. > Since the files are marked by the .gitignore patterns, these files were not > removed by the 'git sparse-checkout set' command. However, they make the > sparse-index unusable because every 'git status' command needs to expand the > sparse-directory entries in order to see if the files are tracked or not. > This made the first experimental release actually slower for all users > because of this cost. As highlighted at https://lore.kernel.org/git/CABPp-BHwRHhYE3BPxCG+fRFWf2dCZ60rnO=ykztquKoOp_x_QQ@xxxxxxxxxxxxxx/ ? """So, although 'git status' shouldn't have to check the tracked files anymore, it is still going to have to look at each of the *untracked* files and compare to .gitignore files in order to correctly classify each file as ignored or just plain untracked.""" As I noted there, """about a month ago my colleagues made the tool just auto-invoke [--remove-old-ignores] from other sparsify invocations. To my knowledge, there have been no complaints about that being automatically turned on; but there were complaints/confusion before about the directories being left around. (Of course, non-ignored files are still left around by that option.)""" I think your changes here shouldn't be limited to a sparse-index but should apply to cone-mode sparse checkouts generally. > The solution we shipped to these customers was to change the way our fork > handles these ignored files. Specifically, instead of Git completely > ignoring the files, we changed Git to understand that with cone-mode > sparse-checkout patterns, the users is asking for entire directories to be > removed from the worktree. The link [1] included earlier has this change. Could you clarify what you are removing? 1) Tracked but now SKIP_WOKRTREE files 2) .gitignore'd files 3) other untracked files (1) has always been removed by sparse-checkout. I think you're adding (2) but it's not clear if you are also removing (3). > I believe that this is a reasonable expectation, though I recognize that it > might look like breaking the expectations of how .gitignore files work. > > To limit the scope of this change, the focus is limited to behave this way > only when the sparse-index is enabled, because otherwise the sparse index > does not behave as intended. With that restriction in mind, the > implementation can use the sparse directory entries to assist. > > I imagine that when the sparse index is mature enough, we will enable it by > default when cone mode sparse-checkout is enabled, in which case this > behavior would expand to all users with cone mode sparse-checkout. I'm all in favor of just doing it for all cone mode users, at least for .gitignore'd files. > I'm interested in the community's thoughts about this change, as it seems > like one that we should make carefully and intentionally. Yeah, I approached it pretty cautiously, just adding a new option in our tool (`sparsify --remove-old-ignores`) because I wasn't sure if it should be done by default but wanted to provide an easy way for users to access it. We got numerous complaints and after some months, added a config option to run it automatically. We still got complaints and eventually my colleagues just made it automatically run and provided no way to turn it off. No user complaints since.