Hi, On Wed, Mar 8, 2023 at 5:49 AM William Sprent via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > From: William Sprent <williams@xxxxxxxxxxx> > > There exists no direct way to interrogate git about which paths are > matched by a given set of sparsity rules. It is possible to get this > information from git, but it includes checking out the commit that > contains the paths, applying the sparse checkout patterns and then using > something like 'git ls-files -t' to check if the skip worktree bit is > set. This works in some case, but there are cases where it is awkward or > infeasible to generate a checkout for this purpose. > > Exposing the pattern matching of sparse checkout enables more tooling to > be built and avoids a situation where tools that want to reason about > sparse checkouts start containing parallel implementation of the rules. > To accommodate this, add a 'check-rules' subcommand to the > 'sparse-checkout' builtin along the lines of the 'git check-ignore' and > 'git check-attr' commands. The new command accepts a list of paths on > stdin and outputs just the ones the match the sparse checkout. > > To allow for use in a bare repository and to allow for interrogating > about other patterns than the current ones, include a '--rules-file' > option which allows the caller to explicitly pass sparse checkout rules > in the format accepted by 'sparse-checkout set --stdin'. > > To allow for reuse of the handling of input patterns for the > '--rules-file' flag, modify 'add_patterns_from_input()' to be able to > read from a 'FILE' instead of just stdin. > > To allow for reuse of the logic which decides whether or not rules > should be interpreted as cone-mode patterns, split that part out of > 'update_modes()' such that can be called without modifying the config. > > An alternative could have been to create a new 'check-sparsity' command. > However, placing it under 'sparse-checkout' allows for a) more easily > re-using the sparse checkout pattern matching and cone/non-code mode > handling, and b) keeps the documentation for the command next to the > experimental warning and the cone-mode discussion. > > Signed-off-by: William Sprent <williams@xxxxxxxxxxx> > --- > Documentation/git-sparse-checkout.txt | 23 ++++- > builtin/sparse-checkout.c | 126 +++++++++++++++++++++---- > t/t1091-sparse-checkout-builtin.sh | 129 +++++++++++++++++++++++++- > 3 files changed, 255 insertions(+), 23 deletions(-) > > diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt > index 68392d2a56e..8fdde8f53be 100644 > --- a/Documentation/git-sparse-checkout.txt > +++ b/Documentation/git-sparse-checkout.txt > @@ -9,7 +9,7 @@ git-sparse-checkout - Reduce your working tree to a subset of tracked files > SYNOPSIS > -------- > [verse] > -'git sparse-checkout' (init | list | set | add | reapply | disable) [<options>] > +'git sparse-checkout' (init | list | set | add | reapply | disable | check-rules) [<options>] > > > DESCRIPTION > @@ -135,6 +135,27 @@ paths to pass to a subsequent 'set' or 'add' command. However, > the disable command, so the easy restore of calling a plain `init` > decreased in utility. > > +'check-rules':: > + Check whether sparsity rules match one or more paths. > ++ > +By default `check-rules` reads a list of paths from stdin and outputs only > +the ones that match the current sparsity rules. The input is expected to consist > +of one path per line, matching the output of `git ls-tree --name-only` including > +that pathnames that begin with a double quote (") are interpreted C-style quoted > +strings. > ++ > +When called with the `--rules-file <file>` the input files are matched against > +the sparse checkout rules found in `<file>` instead of the current ones. The > +rules in the files are expected to be in the same form as accepted by `git > +sparse-checkout set --stdin`. Could we add a "(in particular, they must be newline-delimited)" to the end of that paragraph? I built your series locally to test, and my first attempt was: $ git sparse-checkout set t reftable $ echo -e "t/helper/test-userdiff.c\nreftable/basics.c\nmerge-ort.c\nDocumentation/give-up.txt" | git sparse-checkout check-rules t/helper/test-userdiff.c reftable/basics.c merge-ort.c ...which did exactly what I expected and wanted. So, I decided to undo the sparse checkout and use the --rules-file option: $ echo -e "t/helper/test-userdiff.c\nreftable/basics.c\nmerge-ort.c\nDocumentation/give-up.txt" | git sparse-checkout check-rules --rules-file <(echo t reftable) merge-ort.c I was perplexed. I thought it was a bug and was typing up explaining why this was a problem, before I re-read this sentence of your documentation closer and realized I had just flubbed it. I should have ran: $ echo -e "t/helper/test-userdiff.c\nreftable/basics.c\nmerge-ort.c\nDocumentation/give-up.txt" | git sparse-checkout check-rules --rules-file <(echo -e "t\nreftable") t/helper/test-userdiff.c reftable/basics.c merge-ort.c Given that this wasn't obvious to me after reading the Documentation (it probably should have been but just wasn't), might it be worth calling out? > ++ > +The `--rules-file` flag can be combined with the `--[no]-cone` with the same > +effect as for the `set` command with the `--stdin` flag. > ++ > +When called with the `-z` flag the input format and output format is \0 > +terminated and not quoted. I believe this does not affect the format of the --rules-file option...but maybe it should? Whether it does or doesn't, do we want to call it out when explaining -z's behavior? > + > EXAMPLES > -------- > `git sparse-checkout set MY/DIR1 SUB/DIR2`:: > diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c > index 5fdc3d9aab5..969ae14a415 100644 > --- a/builtin/sparse-checkout.c > +++ b/builtin/sparse-checkout.c > @@ -20,7 +20,7 @@ > static const char *empty_base = ""; > > static char const * const builtin_sparse_checkout_usage[] = { > - N_("git sparse-checkout (init | list | set | add | reapply | disable) [<options>]"), > + N_("git sparse-checkout (init | list | set | add | reapply | disable | check-rules) [<options>]"), > NULL > }; > > @@ -384,13 +384,7 @@ static int set_config(enum sparse_checkout_mode mode) > return 0; > } > > -static int update_modes(int *cone_mode, int *sparse_index) > -{ > - int mode, record_mode; > - > - /* Determine if we need to record the mode; ensure sparse checkout on */ > - record_mode = (*cone_mode != -1) || !core_apply_sparse_checkout; > - > +static enum sparse_checkout_mode update_cone_mode(int *cone_mode) { > /* If not specified, use previous definition of cone mode */ > if (*cone_mode == -1 && core_apply_sparse_checkout) > *cone_mode = core_sparse_checkout_cone; > @@ -398,12 +392,21 @@ static int update_modes(int *cone_mode, int *sparse_index) > /* Set cone/non-cone mode appropriately */ > core_apply_sparse_checkout = 1; > if (*cone_mode == 1 || *cone_mode == -1) { > - mode = MODE_CONE_PATTERNS; > core_sparse_checkout_cone = 1; > - } else { > - mode = MODE_ALL_PATTERNS; > - core_sparse_checkout_cone = 0; > + return MODE_CONE_PATTERNS; > } > + core_sparse_checkout_cone = 0; > + return MODE_ALL_PATTERNS; > +} > + > +static int update_modes(int *cone_mode, int *sparse_index) > +{ > + int mode, record_mode; > + > + /* Determine if we need to record the mode; ensure sparse checkout on */ > + record_mode = (*cone_mode != -1) || !core_apply_sparse_checkout; > + > + mode = update_cone_mode(cone_mode); > if (record_mode && set_config(mode)) > return 1; > > @@ -547,7 +550,7 @@ static void strbuf_to_cone_pattern(struct strbuf *line, struct pattern_list *pl) > > static void add_patterns_from_input(struct pattern_list *pl, > int argc, const char **argv, > - int use_stdin) > + FILE *file) > { > int i; > if (core_sparse_checkout_cone) { > @@ -557,9 +560,9 @@ static void add_patterns_from_input(struct pattern_list *pl, > hashmap_init(&pl->parent_hashmap, pl_hashmap_cmp, NULL, 0); > pl->use_cone_patterns = 1; > > - if (use_stdin) { > + if (file) { > struct strbuf unquoted = STRBUF_INIT; > - while (!strbuf_getline(&line, stdin)) { > + while (!strbuf_getline(&line, file)) { > if (line.buf[0] == '"') { > strbuf_reset(&unquoted); > if (unquote_c_style(&unquoted, line.buf, NULL)) > @@ -581,10 +584,10 @@ static void add_patterns_from_input(struct pattern_list *pl, > } > } > } else { > - if (use_stdin) { > + if (file) { > struct strbuf line = STRBUF_INIT; > > - while (!strbuf_getline(&line, stdin)) { > + while (!strbuf_getline(&line, file)) { > size_t len; > char *buf = strbuf_detach(&line, &len); > add_pattern(buf, empty_base, 0, pl, 0); > @@ -611,7 +614,8 @@ static void add_patterns_cone_mode(int argc, const char **argv, > struct pattern_list existing; > char *sparse_filename = get_sparse_checkout_filename(); > > - add_patterns_from_input(pl, argc, argv, use_stdin); > + add_patterns_from_input(pl, argc, argv, > + use_stdin ? stdin : NULL); > > memset(&existing, 0, sizeof(existing)); > existing.use_cone_patterns = core_sparse_checkout_cone; > @@ -648,7 +652,7 @@ static void add_patterns_literal(int argc, const char **argv, > pl, NULL, 0)) > die(_("unable to load existing sparse-checkout patterns")); > free(sparse_filename); > - add_patterns_from_input(pl, argc, argv, use_stdin); > + add_patterns_from_input(pl, argc, argv, use_stdin ? stdin : NULL); > } > > static int modify_pattern_list(int argc, const char **argv, int use_stdin, > @@ -667,7 +671,8 @@ static int modify_pattern_list(int argc, const char **argv, int use_stdin, > break; > > case REPLACE: > - add_patterns_from_input(pl, argc, argv, use_stdin); > + add_patterns_from_input(pl, argc, argv, > + use_stdin ? stdin : NULL); > break; > } > > @@ -929,6 +934,86 @@ static int sparse_checkout_disable(int argc, const char **argv, > return set_config(MODE_NO_PATTERNS); > } > > +static char const * const builtin_sparse_checkout_check_rules_usage[] = { > + N_("git sparse-checkout check-rules [-z] [--skip-checks]" > + "[--[no-]cone] [--rules-file <file>]"), > + NULL > +}; > + > +static struct sparse_checkout_check_rules_opts { > + int cone_mode; > + int null_termination; > + char *rules_file; > +} check_rules_opts; > + > +static int check_rules(struct pattern_list *pl, int null_terminated) { > + struct strbuf line = STRBUF_INIT; > + struct strbuf unquoted = STRBUF_INIT; > + char *path; > + int line_terminator = null_terminated ? 0 : '\n'; > + strbuf_getline_fn getline_fn = null_terminated ? strbuf_getline_nul > + : strbuf_getline; > + the_repository->index->sparse_checkout_patterns = pl; > + while (!getline_fn(&line, stdin)) { strbuf_getline_nul() and strbuf_getline() are documented as overwriting their strbuf, so there is no need to clear line. > + path = line.buf; > + if (!null_terminated && line.buf[0] == '"') { > + strbuf_reset(&unquoted); > + if (unquote_c_style(&unquoted, line.buf, NULL)) unquote_c_style() is documented as appending to its strbuf, but you handle that via the strbuf_reset() on the line before. > + die(_("unable to unquote C-style string '%s'"), > + line.buf); > + > + path = unquoted.buf; > + } > + > + if (path_in_sparse_checkout(path, the_repository->index)) > + write_name_quoted(path, stdout, line_terminator); > + } Do you need to call strbuf_release() for both line and unquoted here? I think you might have a small leak. > + > + return 0; > +} > + > +static int sparse_checkout_check_rules(int argc, const char **argv, const char *prefix) > +{ > + static struct option builtin_sparse_checkout_check_rules_options[] = { > + OPT_BOOL('z', NULL, &check_rules_opts.null_termination, > + N_("terminate input and output files by a NUL character")), > + OPT_BOOL(0, "cone", &check_rules_opts.cone_mode, > + N_("when used with --rules-file interpret patterns as cone mode patterns")), > + OPT_FILENAME(0, "rules-file", &check_rules_opts.rules_file, > + N_("use patterns in <file> instead of the current ones.")), > + OPT_END(), > + }; > + > + FILE *fp; > + int ret; > + struct pattern_list pl = {0}; > + char *sparse_filename; > + check_rules_opts.cone_mode = -1; > + > + argc = parse_options(argc, argv, prefix, > + builtin_sparse_checkout_check_rules_options, > + builtin_sparse_checkout_check_rules_usage, > + PARSE_OPT_KEEP_UNKNOWN_OPT); > + > + update_cone_mode(&check_rules_opts.cone_mode); > + pl.use_cone_patterns = core_sparse_checkout_cone; > + if (check_rules_opts.rules_file) { > + fp = xfopen(check_rules_opts.rules_file, "r"); > + add_patterns_from_input(&pl, argc, argv, fp); > + fclose(fp); > + } else { > + sparse_filename = get_sparse_checkout_filename(); > + if (add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, > + NULL, 0)) > + die(_("unable to load existing sparse-checkout patterns")); > + free(sparse_filename); > + } > + > + ret = check_rules(&pl, check_rules_opts.null_termination); > + clear_pattern_list(&pl); > + return ret; > +} > + > int cmd_sparse_checkout(int argc, const char **argv, const char *prefix) > { > parse_opt_subcommand_fn *fn = NULL; > @@ -939,6 +1024,7 @@ int cmd_sparse_checkout(int argc, const char **argv, const char *prefix) > OPT_SUBCOMMAND("add", &fn, sparse_checkout_add), > OPT_SUBCOMMAND("reapply", &fn, sparse_checkout_reapply), > OPT_SUBCOMMAND("disable", &fn, sparse_checkout_disable), > + OPT_SUBCOMMAND("check-rules", &fn, sparse_checkout_check_rules), > OPT_END(), > }; > > diff --git a/t/t1091-sparse-checkout-builtin.sh b/t/t1091-sparse-checkout-builtin.sh > index 7216267aec7..521dc914fa7 100755 > --- a/t/t1091-sparse-checkout-builtin.sh > +++ b/t/t1091-sparse-checkout-builtin.sh > @@ -555,7 +555,7 @@ test_expect_success 'cone mode: set with core.ignoreCase=true' ' > check_files repo a folder1 > ' > > -test_expect_success 'interaction with submodules' ' > +test_expect_success 'setup submodules' ' > git clone repo super && > ( > cd super && > @@ -566,11 +566,22 @@ test_expect_success 'interaction with submodules' ' > git commit -m "add submodule" && > git sparse-checkout init --cone && > git sparse-checkout set folder1 > - ) && > + ) > +' > + > +test_expect_success 'interaction with submodules' ' > check_files super a folder1 modules && > check_files super/modules/child a deep folder1 folder2 > ' > > +test_expect_success 'check-rules interaction with submodules' ' > + git -C super ls-tree --name-only -r HEAD >all-files && > + git -C super sparse-checkout check-rules >check-rules-matches <all-files && > + > + test_i18ngrep ! "modules/" check-rules-matches && > + test_i18ngrep "folder1/" check-rules-matches This file already uses test_i18ngrep, so continuing to use it is okay just for consistency. But in general test_i18ngrep is deprecated ("Should not be used and will be removed soon." according to the docs; soon is relative) and we prefer to just use grep. > +' > + > test_expect_success 'different sparse-checkouts with worktrees' ' > git -C repo sparse-checkout set --cone deep folder1 && > git -C repo worktree add --detach ../worktree && > @@ -915,4 +926,118 @@ test_expect_success 'disable fails outside work tree' ' > test_i18ngrep "this operation must be run in a work tree" err > ' > > +test_expect_success 'setup clean' ' > + git -C repo clean -fdx > +' > + > +test_expect_success 'check-rules cone mode' ' > + cat >rules <<-\EOF && > + folder1 > + deep/deeper1/deepest > + EOF > + > + git -C bare ls-tree -r --name-only HEAD >all-files && > + git -C bare sparse-checkout check-rules --cone \ > + --rules-file ../rules >check-rules-file <all-files && > + > + git -C repo sparse-checkout set --cone --stdin <rules&& > + git -C repo ls-files -t >out && > + sed -n "/^S /!s/^. //p" out >ls-files && > + > + git -C repo sparse-checkout check-rules >check-rules-default <all-files && > + > + test_i18ngrep "deep/deeper1/deepest/a" check-rules-file && > + test_i18ngrep ! "deep/deeper2" check-rules-file && > + > + test_cmp check-rules-file ls-files && > + test_cmp check-rules-file check-rules-default > +' > + > +test_expect_success 'check-rules non-cone mode' ' > + cat >rules <<-\EOF && > + deep/deeper1/deepest/a > + EOF > + > + git -C bare ls-tree -r --name-only HEAD >all-files && > + git -C bare sparse-checkout check-rules --no-cone --rules-file ../rules\ > + >check-rules-file <all-files && > + > + cat rules | git -C repo sparse-checkout set --no-cone --stdin && > + git -C repo ls-files -t >out && > + sed -n "/^S /!s/^. //p" out >ls-files && > + > + git -C repo sparse-checkout check-rules >check-rules-default <all-files && > + > + cat >expect <<-\EOF && > + deep/deeper1/deepest/a > + EOF > + > + test_cmp expect check-rules-file && > + test_cmp check-rules-file ls-files && > + test_cmp check-rules-file check-rules-default > +' > + > +test_expect_success 'check-rules cone mode is default' ' > + cat >rules <<-\EOF && > + folder1 > + EOF > + > + cat >all-files <<-\EOF && > + toplevel > + folder2/file > + folder1/file > + EOF > + > + cat >expect <<-\EOF && > + toplevel > + folder1/file > + EOF > + > + git -C bare sparse-checkout check-rules \ > + --rules-file ../rules >actual <all-files && > + > + test_cmp expect actual This test is checking that despite whatever sparse-checkout setting we have, that the --rules-files option makes it ignore those rules and use the specified one instead, and in particular that cone mode is assumed when nothing is specified with --rules-file. The test is stronger because the test right before this one did a non-cone-mode checkout. But since this test didn't itself do a non-cone-mode checkout, that extra strength of the test could be lost by a future contributor inserting another test between these. It's not a big deal, but perhaps it'd make sense to have this test add a `git sparse-checkout set --no-cone ...` call near the beginning? > +' > + > +test_expect_success 'check-rules quoting' ' > + cat >rules <<-EOF && > + "folder\" a" > + EOF > + cat >files <<-EOF && > + "folder\" a/file" > + "folder\" b/file" > + EOF > + cat >expect <<-EOF && > + "folder\" a/file" > + EOF > + git sparse-checkout check-rules --cone \ > + --rules-file rules >actual <files && > + > + test_cmp expect actual > +' > + > +test_expect_success 'check-rules null termination' ' > + cat >rules <<-EOF && > + "folder\" a" > + EOF > + > + lf_to_nul >files <<-EOF && > + folder" a/a > + folder" a/b > + folder" b/fileQ > + EOF > + > + cat >expect <<-EOF && > + folder" a/aQfolder" a/bQ > + EOF > + > + git sparse-checkout check-rules --cone -z \ > + --rules-file rules >actual.nul <files && > + nul_to_q <actual.nul >actual && > + echo >>actual && > + > + test_cmp expect actual > +' > + > + > test_done > -- > gitgitgadget Overall, nicely done. I only found minor things to comment on, and like the overall design, explanation, and implementation.