On Tue, Feb 22, 2022 at 6:26 PM Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: > > From: Elijah Newren <newren@xxxxxxxxx> > > Typically with sparse checkouts, we expect files outside the sparsity > patterns to be marked as SKIP_WORKTREE and be missing from the working > tree. In edge cases, this can be violated and cause confusion, so in > a sparse checkout, since 11d46a399d ("repo_read_index: clear > SKIP_WORKTREE bit from files present in worktree", 2022-01-06), Git > automatically clears the SKIP_WORKTREE bit at read time for entries > corresponding to files that are present in the working tree. > > However, there is a more atypical situation where this situation would > be expected. A Git-aware virtual file system[1] takes advantage of > its position as a file system driver to expose all files in the > working tree, fetch them on demand using partial clone on access, and > tell Git to pay attention to them on demand by updating the sparse > checkout pattern on writes. This means that commands like "git > status" only has to examine files that have potentially been modified, > whereas commands like "ls" are able to show the entire codebase > without requiring manual updates to the sparse checkout pattern. Should that be s/commands/a command/ or else s/only has/only have/? > Thus since 11d46a399d, Git with such Git-aware virtual file systems > unsets the SKIP_WORKTREE bit for all files and commands like "git > status" have to fetch and examine them all. > > Introduce a configuration setting sparse.expectFilesOutsideOfPatterns > to allow limiting the tracked set of files to a small set once again. > A Git-aware virtual file system or other application that wants to > maintain files outside of the sparse checkout can set this in a > repository to instruct Git not to check for the presence of > SKIP_WORKTREE files. The setting defaults to false, so most users of > sparse checkout will still get the benefit of an automatically > updating index to recover from interrupted updates that forget to Please don't presume that these only come from interrupted updates. As per the referenced 11d46a399d that started all this: """ There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. """ > delete some files or unset SKIP_WORKTREE for them. The only problem 11d46a399d corrects is having the SKIP_WORKTREE being *set* despite the file being present. So the "recover from...updates that...unset SKIP_WORKTREE" doesn't make any sense. > [1] such as the vfsd described in > https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@xxxxxxxxxx/ > > [jn: fleshed out commit message and documentation, added missing > include to config.txt, moved to a separate config callback] > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > Signed-off-by: Jonathan Nieder <jrnieder@xxxxxxxxx> > --- > Some minor updates, but this is basically the same as the patch you > sent. Thoughts? ...and it looks even more like the v2 I was about to send, so it seems like we're pretty much on the same page. :-) Overall, it looks good, and your version has some things that are nicer than mine, but I did have a couple small notes on the commit message above and a similar one on the config description below. > Thanks, > Jonathan > > Documentation/config.txt | 2 ++ > Documentation/config/sparse.txt | 24 ++++++++++++++++++++++++ > cache.h | 1 + > config.c | 14 ++++++++++++++ > environment.c | 1 + > sparse-index.c | 3 ++- > t/t1090-sparse-checkout-scope.sh | 19 +++++++++++++++++++ > 7 files changed, 63 insertions(+), 1 deletion(-) > create mode 100644 Documentation/config/sparse.txt > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index b168f02dc3d..8628ae2634d 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -468,6 +468,8 @@ include::config/sequencer.txt[] > > include::config/showbranch.txt[] > > +include::config/sparse.txt[] > + > include::config/splitindex.txt[] > > include::config/ssh.txt[] > diff --git a/Documentation/config/sparse.txt b/Documentation/config/sparse.txt > new file mode 100644 > index 00000000000..c790c728276 > --- /dev/null > +++ b/Documentation/config/sparse.txt > @@ -0,0 +1,24 @@ > +sparse.expectFilesOutsideOfPatterns:: > + Typically with sparse checkouts, files not matching any > + sparsity patterns are marked as such in the index file and > + missing from the working tree. Accordingly, Git will > + ordinarily check whether files that the index indicates are > + outside of the sparse area are present in the working tree and > + mark them as present in the index if so. This option can be > + used to tell Git that such present-but-unmatching files are > + expected and to stop checking for them. > ++ > +The default is `false`. Leaving this set to `false` is recommended in > +most situations because it allows Git to recover from an interrupted > +operation that updated the working tree without updating the index or > +vice versa. Again, please don't claim this is only for recovering from interrupted operations; there are other cases -- Git commands that write the working tree but not the index (checkout-index, git apply), and users mucking around with files, for example. > ++ > +A Git-based virtual file system (VFS) can turn the usual expectation > +on its head: files are present in the working copy but do not take > +up much disk space because their contents are not downloaded until > +they are accessed. With such a virtual file system layer, most files > +do not match the sparsity patterns at first, and the VFS layer > +updates the sparsity patterns to add more files whenever files are > +written. Setting this to `true` supports such a setup where files are > +expected to be present outside the sparse area and a separate, robust > +mechanism is responsible for keeping the sparsity patterns up to date. > diff --git a/cache.h b/cache.h > index 281f00ab1b1..b6b8e83ae35 100644 > --- a/cache.h > +++ b/cache.h > @@ -1003,6 +1003,7 @@ extern const char *core_fsmonitor; > > extern int core_apply_sparse_checkout; > extern int core_sparse_checkout_cone; > +extern int sparse_expect_files_outside_of_patterns; > > /* > * Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value). > diff --git a/config.c b/config.c > index 2bffa8d4a01..9b9ad1500aa 100644 > --- a/config.c > +++ b/config.c > @@ -1544,6 +1544,17 @@ static int git_default_core_config(const char *var, const char *value, void *cb) > return platform_core_config(var, value, cb); > } > > +static int git_default_sparse_config(const char *var, const char *value) > +{ > + if (!strcmp(var, "sparse.expectfilesoutsideofpatterns")) { > + sparse_expect_files_outside_of_patterns = git_config_bool(var, value); > + return 0; > + } > + > + /* Add other config variables here and to Documentation/config/sparse.txt. */ > + return 0; > +} > + > static int git_default_i18n_config(const char *var, const char *value) > { > if (!strcmp(var, "i18n.commitencoding")) > @@ -1675,6 +1686,9 @@ int git_default_config(const char *var, const char *value, void *cb) > return 0; > } > > + if (starts_with(var, "sparse.")) > + return git_default_sparse_config(var, value); > + > /* Add other config variables here and to Documentation/config.txt. */ > return 0; > } > diff --git a/environment.c b/environment.c > index fd0501e77a5..fb55bf61290 100644 > --- a/environment.c > +++ b/environment.c > @@ -70,6 +70,7 @@ char *notes_ref_name; > int grafts_replace_parents = 1; > int core_apply_sparse_checkout; > int core_sparse_checkout_cone; > +int sparse_expect_files_outside_of_patterns; > int merge_log_config = -1; > int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */ > unsigned long pack_size_limit_cfg; > diff --git a/sparse-index.c b/sparse-index.c > index eed170cd8f7..daeb5112a18 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -396,7 +396,8 @@ void clear_skip_worktree_from_present_files(struct index_state *istate) > > int i; > > - if (!core_apply_sparse_checkout) > + if (!core_apply_sparse_checkout || > + sparse_expect_files_outside_of_patterns) > return; > > restart: > diff --git a/t/t1090-sparse-checkout-scope.sh b/t/t1090-sparse-checkout-scope.sh > index 3deb4901874..d1833c0f31b 100755 > --- a/t/t1090-sparse-checkout-scope.sh > +++ b/t/t1090-sparse-checkout-scope.sh > @@ -52,6 +52,25 @@ test_expect_success 'return to full checkout of main' ' > test "$(cat b)" = "modified" > ' > > +test_expect_success 'skip-worktree on files outside sparse patterns' ' > + git sparse-checkout disable && > + git sparse-checkout set --no-cone "a*" && > + git checkout-index --all --ignore-skip-worktree-bits && > + > + git ls-files -t >output && > + ! grep ^S output >actual && > + test_must_be_empty actual && > + > + test_config sparse.expectFilesOutsideOfPatterns true && > + cat <<-\EOF >expect && > + S b > + S c > + EOF > + git ls-files -t >output && > + grep ^S output >actual && > + test_cmp expect actual > +' > + > test_expect_success 'in partial clone, sparse checkout only fetches needed blobs' ' > test_create_repo server && > git clone "file://$(pwd)/server" client && > -- > 2.35.1.574.g5d30c73bfb