William Sprent via GitGitGadget wrote: > From: William Sprent <williams@xxxxxxxxxxx> > > The sparse checkout documentation states that the cone mode pattern set > is limited to patterns that either recursively include directories or > patterns that match all files in a directory. In the sparse checkout > file, the former manifest in the form: > > /A/B/C/ > > while the latter become a pair of patterns either in the form: > > /A/B/ > !/A/B/*/ > > or in the special case of matching the toplevel files: > > /* > !/*/ > > The 'add_pattern_to_hashsets()' function contains checks which serve to > disable cone-mode when non-cone patterns are encountered. However, these > do not catch when the pattern list attempts to match a single file or > directory, e.g. a pattern in the form: > > /A/B/C > > This causes sparse-checkout to exhibit unexpected behaviour when such a > pattern is in the sparse-checkout file and cone mode is enabled. > Concretely, with the pattern like the above, sparse-checkout, in > non-cone mode, will only include the directory or file located at > '/A/B/C'. However, with cone mode enabled, sparse-checkout will instead > just manifest the toplevel files but not any file located at '/A/B/C'. > > Relatedly, issues occur when supplying the same kind of filter when > partial cloning with '--filter=sparse:oid=<oid>'. 'upload-pack' will > correctly just include the objects that match the non-cone pattern > matching. Which means that checking out the newly cloned repo with the > same filter, but with cone mode enabled, fails due to missing objects. This clearly explains the issue you're trying to fix: cone mode sparse-checkout requires patterns like '/A/B/', !/A/B/*/', '/A/B/C/', etc., but invalid patterns like '/A/B/C' (no trailing slash) currently don't force the switch to non-cone mode, leading to unexpected behavior. > > To fix these issues, add a cone mode pattern check that asserts that > every pattern is either a directory match or the pattern '/*'. Add a > test to verify the new pattern check and modify another to reflect that > non-directory patterns are caught earlier. I think this is the best way to maintain the current intended behavior of cone mode sparse-checkout. While the idea of single file "exceptions" to cone patterns has been brought up in the past (I think most recently at the Contributor's Summit this past September [1]), it'd be a substantial change that's definitely out-of-scope of this small bugfix. [1] https://lore.kernel.org/git/YzXwOsaoCdBhHsX1@nand.local/ > > Signed-off-by: William Sprent <williams@xxxxxxxxxxx> > --- > dir: check for single file cone patterns > > Resubmitting this without the superfluous double negation of the > 'strcmp' output. > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1446%2Fwilliams-unity%2Fws%2Fsparse-checkout-pattern-match-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1446/williams-unity/ws/sparse-checkout-pattern-match-v2 > Pull-Request: https://github.com/gitgitgadget/git/pull/1446 > > Range-diff vs v1: > > 1: cfb4b75c378 ! 1: d5406e62f6f dir: check for single file cone patterns > @@ dir.c: static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_ > } > > + if (!(given->flags & PATTERN_FLAG_MUSTBEDIR) && > -+ !!strcmp(given->pattern, "/*")) { > ++ strcmp(given->pattern, "/*")) { > + /* Not a cone pattern. */ > + warning(_("unrecognized pattern: '%s'"), given->pattern); > + goto clear_hashmaps; > > > dir.c | 7 +++++++ > t/t1091-sparse-checkout-builtin.sh | 11 ++++++++++- > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/dir.c b/dir.c > index fbdb24fc819..4e99f0c868f 100644 > --- a/dir.c > +++ b/dir.c > @@ -732,6 +732,13 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern > goto clear_hashmaps; > } > > + if (!(given->flags & PATTERN_FLAG_MUSTBEDIR) && > + strcmp(given->pattern, "/*")) { This condition is only entered if: - the pattern *does not* end in a trailing slash ('!(given->flags & PATTERN_FLAG_MUSTBEDIR)'), AND - the pattern *does not* start with '/*' ('strcmp(given->pattern, "/*")') The '/*'-like patterns are already handled by other parts of 'add_pattern_to_hashsets()', so only the '/A/B/C'-like patterns will trigger the "unrecognized" warning. Looks good! > + /* Not a cone pattern. */ > + warning(_("unrecognized pattern: '%s'"), given->pattern); > + goto clear_hashmaps; > + } > + > prev = given->pattern; > cur = given->pattern + 1; > next = given->pattern + 2; > diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh > index b563d6c263e..627267be153 100755 > --- a/t/t1091-sparse-checkout-builtin.sh > +++ b/t/t1091-sparse-checkout-builtin.sh > @@ -238,7 +238,7 @@ test_expect_success 'cone mode: match patterns' ' > test_expect_success 'cone mode: warn on bad pattern' ' > test_when_finished mv sparse-checkout repo/.git/info/ && > cp repo/.git/info/sparse-checkout . && > - echo "!/deep/deeper/*" >>repo/.git/info/sparse-checkout && > + echo "!/deep/deeper/*/" >>repo/.git/info/sparse-checkout && This test changes because, without the trailing slash on the pattern, the error would become "unrecognized pattern: !/deep/deeper/*" due to the new check in 'add_pattern_to_hashsets()'. Changing the test pattern ensures we're still exercising what the test was originally intended to exercise, and doesn't indicate a regression. > git -C repo read-tree -mu HEAD 2>err && > test_i18ngrep "unrecognized negative pattern" err > ' > @@ -667,6 +667,15 @@ test_expect_success 'pattern-checks: starting "*"' ' > check_read_tree_errors repo "a deep" "disabling cone pattern matching" > ' > > +test_expect_success 'pattern-checks: non directory pattern' ' > + cat >repo/.git/info/sparse-checkout <<-\EOF && > + /deep/deeper1/a > + EOF > + check_read_tree_errors repo deep "disabling cone pattern matching" && > + check_files repo/deep deeper1 && > + check_files repo/deep/deeper1 a > +' And this test ensures the new check is working for the appropriate patterns. This patch looks good to me, thanks for finding & fixing the bug! > + > test_expect_success 'pattern-checks: contained glob characters' ' > for c in "[a]" "\\" "?" "*" > do > > base-commit: 7c2ef319c52c4997256f5807564523dfd4acdfc7