On Mon, Jan 23 2023, William Sprent via GitGitGadget wrote: > From: William Sprent <williams@xxxxxxxxxxx> ...some further inline commentary, in addition to my proposed squash-in... > diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt > index 0240adb8eec..724bb1f9dbd 100644 > --- a/Documentation/git-ls-tree.txt > +++ b/Documentation/git-ls-tree.txt > @@ -11,6 +11,7 @@ SYNOPSIS > [verse] > 'git ls-tree' [-d] [-r] [-t] [-l] [-z] > [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]] [--format=<format>] > + [--sparse-filter-oid=<blob-ish>] > <tree-ish> [<path>...] > > DESCRIPTION > @@ -48,6 +49,11 @@ OPTIONS > Show tree entries even when going to recurse them. Has no effect > if `-r` was not passed. `-d` implies `-t`. > > +--sparse-filter-oid=<blob-ish>:: > + Omit showing tree objects and paths that do not match the > + sparse-checkout specification contained in the blob > + (or blob-expression) <blob-ish>. > + > -l:: > --long:: > Show object size of blob (file) entries. > diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c > index 72eb70823d5..46a815fc7dc 100644 > --- a/builtin/ls-tree.c > +++ b/builtin/ls-tree.c > @@ -13,6 +13,7 @@ > #include "builtin.h" > #include "parse-options.h" > #include "pathspec.h" > +#include "dir.h" > > static const char * const ls_tree_usage[] = { > N_("git ls-tree [<options>] <tree-ish> [<path>...]"), > @@ -364,6 +365,65 @@ static struct ls_tree_cmdmode_to_fmt ls_tree_cmdmode_format[] = { > }, > }; Okey, here yo uupdate the *.txt, but not the "-h", but it's one of those where way say <options>. I for one wouldn't mind some preceding change like my 423be1f83c5 (doc txt & -h consistency: make "commit" consistent, 2022-10-13), which in turn would make t/t0450-txt-doc-vs-help.sh pass for this command, but maybe it's too much of a digression... > + (*d) = xcalloc(1, sizeof(**d)); > + (*d)->fn = fn; > + (*d)->pl.use_cone_patterns = core_sparse_checkout_cone; I forgot to note in my proposed fix-up that I omitted this, but your tests still pass, which suggests it's either not needed, or your tests are lacking.... > + (*d)->options = options; > + strbuf_init(&(*d)->full_path_buf, 0); > + > + if (get_oid_with_context(the_repository, sparse_oid_name, GET_OID_BLOB, > + &sparse_oid, &oc)) > + die(_("unable to access sparse blob in '%s'"), sparse_oid_name); > + if (add_patterns_from_blob_to_list(&sparse_oid, "", 0, &(*d)->pl) < 0) As noted you can avoid the malloc here, which also sidesteps this (IMO at last) ugly &(*v)->m syntax. > + die(_("unable to parse sparse filter data in %s"), > + oid_to_hex(&sparse_oid)); > +} > + > +static void free_sparse_filter_data(struct sparse_filter_data *d) > +{ > + clear_pattern_list(&d->pl); > + strbuf_release(&d->full_path_buf); > + free(d); > +} > + > +static int path_matches_sparse_checkout_patterns(struct strbuf *full_path, struct pattern_list *pl, int dtype) > +{ > + enum pattern_match_result match = recursively_match_path_with_sparse_patterns(full_path->buf, the_repository->index, dtype, pl); I for one would welcome e.g. abbreviating "sparse_checkout_patterns" as "scp" or whatever throughout, with resulting shortening of very long lines & identifiers. E.g. for this one "patch_match_scp" or whatever. > + struct sparse_filter_data *data = context; > + Style: Don't add a \n\n between variable decls, > + int do_recurse = show_recursive(data->options, base->buf, base->len, pathname); Instead add it here, before the code proper. > + if (object_type(mode) == OBJ_TREE) > + return do_recurse; > + > + strbuf_reset(&data->full_path_buf); I wondered if we need this after, but we don't, I for one would welcome a fix-up like: diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c index 68d6ef675f2..74dfa9a4580 100644 --- a/builtin/ls-tree.c +++ b/builtin/ls-tree.c @@ -410,19 +410,21 @@ static int filter_sparse(const struct object_id *oid, struct strbuf *base, const char *pathname, unsigned mode, void *context) { struct sparse_filter_data *data = context; - int do_recurse = show_recursive(data->options, base->buf, base->len, pathname); + int ret = 0; + if (object_type(mode) == OBJ_TREE) return do_recurse; - strbuf_reset(&data->full_path_buf); strbuf_addbuf(&data->full_path_buf, base); strbuf_addstr(&data->full_path_buf, pathname); if (!path_matches_sparse_checkout_patterns(&data->full_path_buf, &data->pl, DT_REG)) - return 0; - - return data->fn(oid, base, pathname, mode, data->options); + goto cleanup; + ret = data->fn(oid, base, pathname, mode, data->options); +cleanup: + strbuf_reset(&data->full_path_buf); + return ret; } int cmd_ls_tree(int argc, const char **argv, const char *prefix) It's slightly more verbose, and we do end up needlessly reset-ing the "last" one, but makes it clear that we don't have need for this after this. Which to me, also raises the question of why you have this "full_path_buf" at all. It looks like a memory optimization, as you're trying to avoid a malloc()/free() in the loop. That's fair, but you could also do that with: const size_t oldlen = base->len; strbuf_addstr(base, pathname); /* do the path_matches_sparse_checkout_patterns() call here */ /* before "cleanup" */ strbuf_setlen(base, oldlen); Or whatever, those snippets are untested, but we use similar tricks elsewhere, and as it's contained to a few lines here I think it's better than carrying another buffer. > + strbuf_addbuf(&data->full_path_buf, base); > + strbuf_addstr(&data->full_path_buf, pathname); Nit: Shorter as: strbuf_addf(&sb, "%s%s", base.buf, pathname) (or equivalent); > + > + if (!path_matches_sparse_checkout_patterns(&data->full_path_buf, &data->pl, DT_REG)) > + return 0; > + > + return data->fn(oid, base, pathname, mode, data->options); > +} > + > int cmd_ls_tree(int argc, const char **argv, const char *prefix) > { > struct object_id oid; > @@ -372,7 +432,11 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix) > read_tree_fn_t fn = NULL; > enum ls_tree_cmdmode cmdmode = MODE_DEFAULT; > int null_termination = 0; > + Don't add this stray \n. > struct ls_tree_options options = { 0 }; > + char *sparse_oid_name = NULL; > + void *read_tree_context = NULL; > + struct sparse_filter_data *sparse_filter_data = NULL; > const struct option ls_tree_options[] = { > OPT_BIT('d', NULL, &options.ls_options, N_("only show trees"), > LS_TREE_ONLY), > @@ -399,6 +463,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix) > N_("format to use for the output"), > PARSE_OPT_NONEG), > OPT__ABBREV(&options.abbrev), > + OPT_STRING_F(0, "filter-sparse-oid", &sparse_oid_name, N_("filter-sparse-oid"), Make that N_(...) be "object-id", "oid", "hash" or something? I.e. the RHS with the <type> should be <thingy>, not <thingy-for-some-purpose>. > + N_("filter output with sparse-checkout specification contained in the blob"), > + PARSE_OPT_NONEG), > OPT_END() > }; > struct ls_tree_cmdmode_to_fmt *m2f = ls_tree_cmdmode_format; > @@ -474,7 +541,17 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix) > break; > } > > - ret = !!read_tree(the_repository, tree, &options.pathspec, fn, &options); > + if (sparse_oid_name) { > + init_sparse_filter_data(&sparse_filter_data, &options, sparse_oid_name, fn); > + read_tree_context = sparse_filter_data; > + fn = filter_sparse; > + } else > + read_tree_context = &options; You're missing a {} here for the "else". Better yet, delete that "else" and change the decl above to be: void *context = &options; Then just keep the "if" here, where you might clobber the "context". > + > + ret = !!read_tree(the_repository, tree, &options.pathspec, fn, read_tree_context); > clear_pathspec(&options.pathspec); > + if (sparse_filter_data) > + free_sparse_filter_data(sparse_filter_data); I suggested a fixup for this, but as an aside don't make a free_*() function that's not happy to accept NULL, it should work like the free() itself. > + > return ret; > } > diff --git a/dir.c b/dir.c > index 4e99f0c868f..122ebced08e 100644 > --- a/dir.c > +++ b/dir.c > @@ -1457,45 +1457,50 @@ int init_sparse_checkout_patterns(struct index_state *istate) > return 0; > } > > -static int path_in_sparse_checkout_1(const char *path, > - struct index_state *istate, > - int require_cone_mode) > +int recursively_match_path_with_sparse_patterns(const char *path, > + struct index_state *istate, > + int dtype, > + struct pattern_list *pl) > { > - int dtype = DT_REG; > enum pattern_match_result match = UNDECIDED; > const char *end, *slash; > - > - /* > - * We default to accepting a path if the path is empty, there are no > - * patterns, or the patterns are of the wrong type. > - */ > - if (!*path || > - init_sparse_checkout_patterns(istate) || > - (require_cone_mode && > - !istate->sparse_checkout_patterns->use_cone_patterns)) > - return 1; > - > /* > * If UNDECIDED, use the match from the parent dir (recursively), or > * fall back to NOT_MATCHED at the topmost level. Note that cone mode > * never returns UNDECIDED, so we will execute only one iteration in > * this case. > */ > - for (end = path + strlen(path); > - end > path && match == UNDECIDED; > + for (end = path + strlen(path); end > path && match == UNDECIDED; All good, aside from this whitespace refactoring, as noted. > end = slash) { > - > for (slash = end - 1; slash > path && *slash != '/'; slash--) > ; /* do nothing */ > > match = path_matches_pattern_list(path, end - path, > slash > path ? slash + 1 : path, &dtype, > - istate->sparse_checkout_patterns, istate); > + pl, istate); > > /* We are going to match the parent dir now */ > dtype = DT_DIR; > } > - return match > 0; > + > + return match; > +} > + > +static int path_in_sparse_checkout_1(const char *path, > + struct index_state *istate, > + int require_cone_mode) > +{ > + /* > + * We default to accepting a path if the path is empty, there are no > + * patterns, or the patterns are of the wrong type. > + */ > + if (!*path || > + init_sparse_checkout_patterns(istate) || > + (require_cone_mode && > + !istate->sparse_checkout_patterns->use_cone_patterns)) > + return 1; > + > + return recursively_match_path_with_sparse_patterns(path, istate, DT_REG, istate->sparse_checkout_patterns) > 0; All good, except for the really long line, aside from the previous suggestion, maybe pull "istate->sparse_checkout_patterns" into a variable, as it's now used twice here in this function? > +check_agrees_with_ls_files () { > + REPO=repo && > + git -C $REPO submodule deinit -f --all && > + git -C $REPO cat-file -p $filter_oid >"$REPO/.git/info/sparse-checkout" && > + git -C $REPO sparse-checkout init --cone && > + git -C $REPO submodule init && > + git -C $REPO ls-files -t >out && > + sed -n "/^S /!s/^. //p" out >expect && > + test_cmp expect actual Instead of this last "sed" munging, why not use the "--format" option to "ls-tree" to just make the output the same?