Re: [PATCH v2] ls-tree: add --sparse-filter-oid argument

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23/01/2023 14.06, Ævar Arnfjörð Bjarmason wrote:

On Mon, Jan 23 2023, William Sprent via GitGitGadget wrote:

From: William Sprent <williams@xxxxxxxxxxx>

...some further inline commentary, in addition to my proposed squash-in...


Thanks for taking the time. :)

diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index 0240adb8eec..724bb1f9dbd 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -11,6 +11,7 @@ SYNOPSIS
  [verse]
  'git ls-tree' [-d] [-r] [-t] [-l] [-z]
  	    [--name-only] [--name-status] [--object-only] [--full-name] [--full-tree] [--abbrev[=<n>]] [--format=<format>]
+	    [--sparse-filter-oid=<blob-ish>]
  	    <tree-ish> [<path>...]
DESCRIPTION
@@ -48,6 +49,11 @@ OPTIONS
  	Show tree entries even when going to recurse them. Has no effect
  	if `-r` was not passed. `-d` implies `-t`.
+--sparse-filter-oid=<blob-ish>::
+	Omit showing tree objects and paths that do not match the
+	sparse-checkout specification contained in the blob
+	(or blob-expression) <blob-ish>.
+
  -l::
  --long::
  	Show object size of blob (file) entries.
diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
index 72eb70823d5..46a815fc7dc 100644
--- a/builtin/ls-tree.c
+++ b/builtin/ls-tree.c
@@ -13,6 +13,7 @@
  #include "builtin.h"
  #include "parse-options.h"
  #include "pathspec.h"
+#include "dir.h"
static const char * const ls_tree_usage[] = {
  	N_("git ls-tree [<options>] <tree-ish> [<path>...]"),
@@ -364,6 +365,65 @@ static struct ls_tree_cmdmode_to_fmt ls_tree_cmdmode_format[] = {
  	},
  };

Okey, here yo uupdate the *.txt, but not the "-h", but it's one of those
where way say <options>.

I for one wouldn't mind some preceding change like my 423be1f83c5 (doc
txt & -h consistency: make "commit" consistent, 2022-10-13), which in
turn would make t/t0450-txt-doc-vs-help.sh pass for this command, but
maybe it's too much of a digression...


I don't mind doing that.

+	(*d) = xcalloc(1, sizeof(**d));
+	(*d)->fn = fn;
+	(*d)->pl.use_cone_patterns = core_sparse_checkout_cone;

I forgot to note in my proposed fix-up that I omitted this, but your
tests still pass, which suggests it's either not needed, or your tests
are lacking....


Well.. The line exists to enable the cone mode optimisations. The only observable differences are performance and that a warning is emitted when cone mode is enabled in the config but the patterns given aren't in the cone mode pattern set.

I can look into writing a performance test that captures the performance differences.

I can also test for the latter behaviour. But that is testing for the side effect of a main behaviour to ensure that the main behaviour is there. Which isn't the nicest thing I can think of, but it might be better than nothing.

+	(*d)->options = options;
+	strbuf_init(&(*d)->full_path_buf, 0);
+
+	if (get_oid_with_context(the_repository, sparse_oid_name, GET_OID_BLOB,
+			&sparse_oid, &oc))
+		die(_("unable to access sparse blob in '%s'"), sparse_oid_name);
+	if (add_patterns_from_blob_to_list(&sparse_oid, "", 0, &(*d)->pl) < 0)

As noted you can avoid the malloc here, which also sidesteps this (IMO
at last) ugly &(*v)->m syntax.


Right.

+		die(_("unable to parse sparse filter data in %s"),
+		    oid_to_hex(&sparse_oid));
+}
+
+static void free_sparse_filter_data(struct sparse_filter_data *d)
+{
+	clear_pattern_list(&d->pl);
+	strbuf_release(&d->full_path_buf);
+	free(d);
+}
+
+static int path_matches_sparse_checkout_patterns(struct strbuf *full_path, struct pattern_list *pl, int dtype)
+{
+	enum pattern_match_result match = recursively_match_path_with_sparse_patterns(full_path->buf, the_repository->index, dtype, pl);

I for one would welcome e.g. abbreviating "sparse_checkout_patterns" as
"scp" or whatever throughout, with resulting shortening of very long
lines & identifiers. E.g. for this one "patch_match_scp" or whatever.



I don't mind either way, but a quick search doesn't reveal any of uses of an abbreviation like, though.

Either way, I could also drop the '_checkout_' from 'sparse_checkout_patterns'. I don't think that would make it less clear what we are talking about.

+	struct sparse_filter_data *data = context;
+

Style: Don't add a \n\n between variable decls,

+	int do_recurse = show_recursive(data->options, base->buf, base->len, pathname);

Instead add it here, before the code proper.



Alright. I'll apply that.

+	if (object_type(mode) == OBJ_TREE)
+		return do_recurse;
+
+	strbuf_reset(&data->full_path_buf);

I wondered if we need this after, but we don't, I for one would welcome a fix-up like:
	
	diff --git a/builtin/ls-tree.c b/builtin/ls-tree.c
	index 68d6ef675f2..74dfa9a4580 100644
	--- a/builtin/ls-tree.c
	+++ b/builtin/ls-tree.c
	@@ -410,19 +410,21 @@ static int filter_sparse(const struct object_id *oid, struct strbuf *base,
	 			 const char *pathname, unsigned mode, void *context)
	 {
	 	struct sparse_filter_data *data = context;
	-
	 	int do_recurse = show_recursive(data->options, base->buf, base->len, pathname);
	+	int ret = 0;
	+
	 	if (object_type(mode) == OBJ_TREE)
	 		return do_recurse;
	
	-	strbuf_reset(&data->full_path_buf);
	 	strbuf_addbuf(&data->full_path_buf, base);
	 	strbuf_addstr(&data->full_path_buf, pathname);
	
	 	if (!path_matches_sparse_checkout_patterns(&data->full_path_buf, &data->pl, DT_REG))
	-			return 0;
	-
	-	return data->fn(oid, base, pathname, mode, data->options);
	+		goto cleanup;
	+	ret = data->fn(oid, base, pathname, mode, data->options);
	+cleanup:
	+	strbuf_reset(&data->full_path_buf);
	+	return ret;
	 }
	
	 int cmd_ls_tree(int argc, const char **argv, const char *prefix)
	
It's slightly more verbose, and we do end up needlessly reset-ing the
"last" one, but makes it clear that we don't have need for this after
this.

Which to me, also raises the question of why you have this
"full_path_buf" at all. It looks like a memory optimization, as you're
trying to avoid a malloc()/free() in the loop.

That's fair, but you could also do that with:

	const size_t oldlen = base->len;
	strbuf_addstr(base, pathname);
         /* do the path_matches_sparse_checkout_patterns() call here */
	/* before "cleanup" */
	strbuf_setlen(base, oldlen);

Or whatever, those snippets are untested, but we use similar tricks
elsewhere, and as it's contained to a few lines here I think it's better
than carrying another buffer.


Okay. That makes sense. And if it is a common trick then I guess it isn't too mysterious.

+	strbuf_addbuf(&data->full_path_buf, base);
+	strbuf_addstr(&data->full_path_buf, pathname);

Nit: Shorter as: strbuf_addf(&sb, "%s%s", base.buf, pathname) (or equivalent);

+
+	if (!path_matches_sparse_checkout_patterns(&data->full_path_buf, &data->pl, DT_REG))
+			return 0;
+
+	return data->fn(oid, base, pathname, mode, data->options);
+}
+
  int cmd_ls_tree(int argc, const char **argv, const char *prefix)
  {
  	struct object_id oid;
@@ -372,7 +432,11 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
  	read_tree_fn_t fn = NULL;
  	enum ls_tree_cmdmode cmdmode = MODE_DEFAULT;
  	int null_termination = 0;
+

Don't add this stray \n.


Ok.

  	struct ls_tree_options options = { 0 };
+	char *sparse_oid_name = NULL;
+	void *read_tree_context = NULL;
+	struct sparse_filter_data *sparse_filter_data = NULL;
  	const struct option ls_tree_options[] = {
  		OPT_BIT('d', NULL, &options.ls_options, N_("only show trees"),
  			LS_TREE_ONLY),
@@ -399,6 +463,9 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
  					 N_("format to use for the output"),
  					 PARSE_OPT_NONEG),
  		OPT__ABBREV(&options.abbrev),
+		OPT_STRING_F(0, "filter-sparse-oid", &sparse_oid_name, N_("filter-sparse-oid"),

Make that N_(...) be "object-id", "oid", "hash" or something?

I.e. the RHS with the <type> should be <thingy>, not
<thingy-for-some-purpose>.


Right. I've used the "blob-ish" wording in the .txt file, I think it makes sense to reuse it here.

+			   N_("filter output with sparse-checkout specification contained in the blob"),
+			     PARSE_OPT_NONEG),
  		OPT_END()
  	};
  	struct ls_tree_cmdmode_to_fmt *m2f = ls_tree_cmdmode_format;
@@ -474,7 +541,17 @@ int cmd_ls_tree(int argc, const char **argv, const char *prefix)
  		break;
  	}
- ret = !!read_tree(the_repository, tree, &options.pathspec, fn, &options);
+	if (sparse_oid_name) {
+		init_sparse_filter_data(&sparse_filter_data, &options, sparse_oid_name, fn);
+		read_tree_context = sparse_filter_data;
+		fn = filter_sparse;
+	} else
+		read_tree_context = &options;

You're missing a {} here for the "else".

Better yet, delete that "else" and change the decl above to be:

	void *context = &options;

Then just keep the "if" here, where you might clobber the "context".


I'll delete the else as you suggest.

+
+	ret = !!read_tree(the_repository, tree, &options.pathspec, fn, read_tree_context);
  	clear_pathspec(&options.pathspec);
+	if (sparse_filter_data)
+		free_sparse_filter_data(sparse_filter_data);

I suggested a fixup for this, but as an aside don't make a free_*()
function that's not happy to accept NULL, it should work like the free()
itself.


Ah. That's a fair point. Thanks.

+
  	return ret;
  }
diff --git a/dir.c b/dir.c
index 4e99f0c868f..122ebced08e 100644
--- a/dir.c
+++ b/dir.c
@@ -1457,45 +1457,50 @@ int init_sparse_checkout_patterns(struct index_state *istate)
  	return 0;
  }
-static int path_in_sparse_checkout_1(const char *path,
-				     struct index_state *istate,
-				     int require_cone_mode)
+int recursively_match_path_with_sparse_patterns(const char *path,
+						struct index_state *istate,
+						int dtype,
+						struct pattern_list *pl)
  {
-	int dtype = DT_REG;
  	enum pattern_match_result match = UNDECIDED;
  	const char *end, *slash;
-
-	/*
-	 * We default to accepting a path if the path is empty, there are no
-	 * patterns, or the patterns are of the wrong type.
-	 */
-	if (!*path ||
-	    init_sparse_checkout_patterns(istate) ||
-	    (require_cone_mode &&
-	     !istate->sparse_checkout_patterns->use_cone_patterns))
-		return 1;
-
  	/*
  	 * If UNDECIDED, use the match from the parent dir (recursively), or
  	 * fall back to NOT_MATCHED at the topmost level. Note that cone mode
  	 * never returns UNDECIDED, so we will execute only one iteration in
  	 * this case.
  	 */
-	for (end = path + strlen(path);
-	     end > path && match == UNDECIDED;
+	for (end = path + strlen(path); end > path && match == UNDECIDED;

All good, aside from this whitespace refactoring, as noted.


Ok.

  	     end = slash) {
-
  		for (slash = end - 1; slash > path && *slash != '/'; slash--)
  			; /* do nothing */
match = path_matches_pattern_list(path, end - path,
  				slash > path ? slash + 1 : path, &dtype,
-				istate->sparse_checkout_patterns, istate);
+				pl, istate);
/* We are going to match the parent dir now */
  		dtype = DT_DIR;
  	}
-	return match > 0;
+
+	return match;
+}
+
+static int path_in_sparse_checkout_1(const char *path,
+				     struct index_state *istate,
+				     int require_cone_mode)
+{
+	/*
+	 * We default to accepting a path if the path is empty, there are no
+	 * patterns, or the patterns are of the wrong type.
+	 */
+	if (!*path ||
+	    init_sparse_checkout_patterns(istate) ||
+	    (require_cone_mode &&
+	     !istate->sparse_checkout_patterns->use_cone_patterns))
+		return 1;
+
+	return recursively_match_path_with_sparse_patterns(path, istate, DT_REG, istate->sparse_checkout_patterns) > 0;

All good, except for the really long line, aside from the previous
suggestion, maybe pull "istate->sparse_checkout_patterns" into a
variable, as it's now used twice here in this function?


I think that makes sense.

+check_agrees_with_ls_files () {
+	REPO=repo  &&
+	git -C $REPO submodule deinit -f --all &&
+	git -C $REPO cat-file -p $filter_oid >"$REPO/.git/info/sparse-checkout" &&
+	git -C $REPO sparse-checkout init --cone &&
+	git -C $REPO submodule init &&
+	git -C $REPO ls-files -t >out &&
+	sed -n "/^S /!s/^. //p" out >expect &&
+	test_cmp expect actual

Instead of this last "sed" munging, why not use the "--format" option to
"ls-tree" to just make the output the same?

I hadn't thought about changing that side of the output. I would have to call ls-tree once more instead of relying on the 'expect' file to be there when the function is called. But that might be nicer anyway.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux