On Wed, Oct 5, 2022 at 1:00 PM Martin von Zweigbergk <martinvonz@xxxxxxxxxx> wrote: > > On Wed, Oct 5, 2022 at 12:51 AM Elijah Newren <newren@xxxxxxxxx> wrote: > > [...] > I agree with you and Stolee that there are two different cases: some > people use sparse checkouts to restrict what they see (behavior A), and > some people use it just as a performance optimization (behavior B). So I > suspect we roughly agree about what should happen if you pass > `--restrict` (or if that becomes the default so you don't actually need to > pass it). My arguments were about the `--no-restrict` case. Sorry, I > should have made that clear. > > I also agree that having a way to make commands restrict to certain paths > by default is useful, and I agree that tying that set of paths to the current > worktree's sparse spec makes sense. > > I'll answer the questions below for the `--no-restrict` case > (behavior B). I don't think your usecase matches behavior B. I think we should label your VFS usecase as behavior C and define it separately. More on that below... [...] > > What about merge/rebase/cherry-pick/revert? Should those write > > additional files to the working tree or avoid it? What about if there > > are conflicts outside the sparse specification? > > I think they should avoid it, but since the user will need to resolve > that conflict anyway, I can see it makes sense to write them to disk > if there are conflicts. > > > > > And if extra files get written outside the sparse specification, are > > there ways for this to get "cleaned up" where after resolving > > conflicts or changes we can again remove the file from the working > > tree? > > I've never really used `git sparse-checkout` (until I read your doc), > but isn't that what `git sparse-checkout reapply` is for? While that command is available for users that want to manually clean things up proactively, my suspicion is that it is used very rarely -- especially now that we have the present-despite-skipped class of issues fixed. I suspect nearly all cleaning up is actually done as an implicit side-effect of calls to unpack_trees(), which would affect commands such `switch`, the switch-like portion of `checkout`, `reset --hard`, `merge`, `rebase`, and many others. All of these commands have two types of implicit clean-up they do as part of their operation (which could be thought of as a post-processing step): (1) marking *unmodified* files outside the sparsity patterns as SKIP_WORKTREE in the index and removing them from the working tree, and (2) taking files which match the sparsity patterns which were previously SKIP_WORKTREE and flip them to !SKIP_WORKTREE and restore them to the working tree. I've got a few examples of what this clean up looks like over at: https://lore.kernel.org/git/CABPp-BHGrxLPu_S3y2zG-U6uo0rM5TYYEREZa2A=e=d9VZb2PA@xxxxxxxxxxxxxx/ I have no idea how this cleanup affects the VFS usecase; it's very focused towards "sparse checkout means many files should NOT be present in the working tree" which may be at odds with how the VFS stuff is intended to behave. But it's also been part of sparse-checkout behavior the longest; for well over a decade now. > > What about `git grep PATTERN`? That's documented to search the > > tracked files in the working tree. But should that include the files > > that would have been there were it not for the "performance > > optimization" of not checking them out? (Similarly, what about `git > > grep --cached PATTERN` or `git grep PATTERN REVISION`?) I mean, if > > these commands should behave the same regardless of sparse > > specification, then you should search those other files, right? But > > isn't that a nasty performance penalty if the user has a > > sparse/partial clone since Git will have to do many network operations > > to get additional blobs in order to search them? Is that really > > wanted? > > I think it's consistent to search them with `--no-restrict` (but not > with `--restrict`, of course). > > > What about `git rm '*.png'` to remove all the tracked png files from > > my working tree. Should that also remove all the files that would > > have been there were it not for the "performance optimization"? Will > > that result in very negative surprises for those with a "I want to > > concentrate on just this subset of files" mental model? > > Same here. > > > > > What about `git worktree add`? Should the sparse specification be the > > same for all worktrees? Be per-worktree? Should it default to dense? > > To just top-level sparse? To the same sparsity as the worktree you > > were in when you created a new one? > > That's an interesting case. If someone does `git worktree add` and expects > all files to be available in the working copy, they might be surprised, yes. > I think that's a much smaller risk than > `git restore --source HEAD^ --staged && git commit -m 'undo changes'` being > partial, however. After you described the VFS usecase, I was guessing you'd answer how you did for most of these commands. Most of your answers do not match the answers I'd expect for behavior B, which seems to me to support my suspicion that you've got a third usecase. In particular, I think the difference between Behavior B and your usecase hinges on the expectation for the working tree: Behavior B: Files outside the sparse specification are NOT present in the working tree. Behavior C (your usecase): Files outside the sparse specification ARE "present" in the working tree, but Git doesn't have to put them there (they'll be lazily put into place by something else, and the VFS will ensure that users don't ever notice them actually missing, so far all intents and purposes, the files are present). In particular, that difference is perhaps most notable with `git grep` (without --cached or REVISION flags); such a command is supposed to search the worktree. For Behavior B, files outside the sparse specification are NOT present in the working tree, and hence those files should NOT be searched. For your usecase, as you highlight above, you view all files as present in the working tree (even if Git isn't the thing writing those files to the working tree and even if they aren't technically present until you query whether they are there), so all those files SHOULD be searched. This difference about the "presence" of files has other knock-on effects too. Under Behavior B, users get used to working on just a subset of files. Thus `git rm '*.jpg'` or `git restore --source HEAD^ -- '*.md'` should NOT overwrite files outside the sparse specification (but an error should be shown if the pathspec doesn't match anything, and that error should point out how users can affect other files outside the sparse specification). Under your usecase, users are always working on the full set of files and all of them can be viewed in their working copy (as enforced by the filesystem intercepting any attempts to view or edit files and suddenly magically materializing them when users look) -- so users in your usecase are not expecting to be working on a subset of files, and thus those commands would operate tree-wide. Similarly, under Behavior B, `git add outside/of/cone/path` should throw an error. If it doesn't, some future command will silently remove the file from the working copy, which may confuse the user; they are getting themselves into an erroneous state. Users are pointed to an override flag they can use if they want to accept the consequences. Under your usecase, since ALL files are always "present" (but not materialized until an attempt to access is made), that same command would be expected to run without an override and with no error or warning. Related to the above, under Behavior B, `git status` should probably report the existence of any untracked files that do not match the sparsity patterns as an erroneous condition (because why wait until git-add to throw an error; let the user know early). Under your usecase, we wouldn't. You might think I'm describing Behavior A above, but only because Behavior A and Behavior B overlap on worktree-related operations. The primary difference between Behavior A and Behavior B is with behavior of history-related operations. Behavior B says "the working tree is sparse, but history is dense; if I do a query on older revisions of history (grep/diff/log/etc.) then give me results across all paths", whereas Behavior A says "I only care about this subset of files, both for the working tree and for history. Unless I override, grep/diff/etc. on history should restrict all output to files within the sparse specification. One potential way to (over)simplify this would be: Behavior A: `--scope=sparse` for both worktree and history operations Behavior B: `--scope=sparse` for worktree operations, `--scope=all` for history operations Behavior C: `--scope=all` for both worktree and history operations