On 2/18/2022 8:06 PM, Jonathan Nieder wrote: > (cc-ing Jonathan Tan, Jose Lopes, and Jeff Hostetler, vfs experts) > Hi Elijah, > > Elijah Newren wrote[1]: > >> The fix is short (~30 lines), but the description is not. Sorry. >> >> There is a set of problems caused by files in what I'll refer to as the >> "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix >> these problems, but remove the entire class as a possibility -- for >> those using sparse checkouts. But first, we need to understand the >> problems this class presents. A quick outline: >> >> * Problems >> * User facing issues >> * Problem space complexity >> * Maintenance and code correctness challenges >> * SKIP_WORKTREE expectations in Git >> * Suggested solution >> * Pros/Cons of suggested solution >> * Notes on testcase modifications > > Thanks for a clear explanation! This is very helpful. > >> === User facing issues === >> >> There are various ways for users to get files to be present in the >> working copy despite having the SKIP_WORKTREE bit set for that file in >> the index. This may come from: >> * various git commands not really supporting the SKIP_WORKTREE bit[1,2] >> * users grabbing files from elsewhere and writing them to the worktree >> (perhaps even cached in their editor) >> * users attempting to "abort" a sparse-checkout operation with a >> not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the >> working tree is not atomic)[3]. >> >> Once users have present-despite-SKIP_WORKTREE files, any modifications >> users make to these files will be ignored, possibly to users' confusion. > [...] >> The suggests a simple solution: present-despite-SKIP_WORKTREE files >> should not exist, for those using sparse-checkouts. > > This patch just reached "next", so at $DAYJOB a test for our vfsd[2] > noticed this change. The trick behind a Git-based virtual filesystem > is typically: > > - since we control the filesystem, we can pretend all files are > already present. On access, we obtain the file content from the git > object store. On write, we update the sparse-checkout pattern so > that Git knows to start tracking the file. > > - by keeping the sparse-checkout pattern narrow, we minimize the time > commands like "git status" need to spend looking for changes in > unmodified files. Controlling the filesystem means we don't need to > worry about changes to files that don't match that pattern (since > any modification would also trigger a sparse-checkout pattern > update). > > If I understand the intent behind this change correctly, it's > incompatible with that trick. How would you recommend handling that? > In the not too far away future, I'd expect something like the "VFS > projection hook" to handle this use case, but in the meantime, I would > expect this change to break VFS for Git performance. A few options: > > a. We could guard it with a config option. It would still be a > regression for VFS for Git users, but they'd be able to use the > config option to restore the expected behavior. (Or > alternatively, such a config option could be disabled by default, > but I suspect that would defeat the purpose described for the > patch.) > > b. We could distinguish between the vfsd and the "interrupted and > forgot to update SKIP_WORKTREE bits in the index" cases somehow. > This sounds complex. > > c. Something else? > > (b) and (c) aren't sounding obviously good, so (a) seems tempting. > What do you think? Just chiming in here to say that we've dealt with these issues in microsoft/git by special-casing them behind our core_virtualfilesystem global. A recent example is the changes to 'git add' to prevent adding a file that is marked as sparse (unless --sparse is specified). We always allow this when in the virtualized scenario [1]. [1] https://github.com/microsoft/git/blob/2f6531aced2e77a6c1000a923967ae0105383930/builtin/add.c#L50-L54 This seems like something that should be on vfsd to handle, and should not prevent upstream Git from making changes that benefit its users. Thanks, -Stolee