Re: [PATCH 1/2] ls-files: optionally recurse into submodules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 21, 2016 at 03:42:52PM -0700, Brandon Williams wrote:

> @@ -68,6 +71,21 @@ static void write_eolinfo(const struct cache_entry *ce, const char *path)
>  static void write_name(const char *name)
>  {
>  	/*
> +	 * NEEDSWORK: To make this thread-safe, full_name would have to be owned
> +	 * by the caller.

I'm not sure if that is quite true. You could simply drop the "static"
here, and pay the malloc cost each time. But if we want to amortize the
allocation, then yeah, somebody else on the stack needs to own it.

That being said, I don't know if it is worth pointing this out as a
NEEDSWORK or not. Most of git is not thread-safe, so I think that's
generally the norm.

> +	 *
> +	 * full_name get reused across output lines to minimize the allocation
> +	 * churn.

Likewise, this kind of strbuf-reuse trickery is common in git. I don't
know if it's worth a comment or not (though I mind this one much less,
because it's a bit more subtle, being across multiple calls rather than
in a single loop).

(To be clear, I actually don't mind either _that_ much; usually the
problem is under-commenting, not over-commenting. I am just trying to
give style pointers since you are new to the project).

> +	static struct strbuf full_name = STRBUF_INIT;
> +	if (submodule_prefix && *submodule_prefix) {
> +		strbuf_reset(&full_name);
> +		strbuf_addstr(&full_name, submodule_prefix);
> +		strbuf_addstr(&full_name, name);
> +		name = full_name.buf;
> +	}

I actually wonder if it would be more obvious if we hoisted out the
prefix copy, too. IOW, have a global:

  /* used to assemble full names in place */
  static struct strbuf prefixed_name = STRBUF_INIT;
  static size_t prefixed_name_base;

then parse --submodule-prefix as:

  strbuf_reset(&prefixed_name);
  strbuf_addstr(&prefixed_name, arg);
  prefixed_name_base = prefixed_name.len;

and then here do:

  if (prefixed_name_base) {
	strbuf_addstr(&prefixed_name, name);
	name = prefixed_name.buf;
  }
  ... do stuff with name ...
  if (prefixed_name_base)
	strbuf_setlen(&prefixed_name, prefixed_name_base);

I dunno. It is slightly more efficient (we do not memcpy the prefix over
and over), but I doubt that matters much in practice. I just wonder if
it makes the buffer reuse a bit more obvious.

> +/**
> + * Recursively call ls-files on a submodule
> + */
> +static void show_gitlink(const struct cache_entry *ce)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	int status;
> +
> +	argv_array_push(&cp.args, "ls-files");
> +	argv_array_push(&cp.args, "--recurse-submodules");
> +	argv_array_pushf(&cp.args, "--submodule-prefix=%s%s/",
> +			 submodule_prefix ? submodule_prefix : "",
> +			 ce->name);
> +	cp.git_cmd = 1;
> +	cp.dir = ce->name;
> +	status = run_command(&cp);
> +	if (status)
> +		exit(status);
> +}

This doesn't propagate the parent argv at all. So if I run:

  git ls-files -z --recurse-submodules

then the paths are all NUL-terminated in the parent, but
newline-terminated in the submodules. Oops.

I see in the second patch you make an effort to pass in some specific
options explicitly. In some ways that is safer (we do not accidentally
pass an argument that needs to be munged). But in some ways it is less
safe (as shown by "-z" above, anything we fail to pass in show_gitlink()
must be explicitly blocked in cmd_ls_files(), or we produce garbage
output). It also means that we have a bunch of boilerplate forwarding
the options along (e.g., your second patch lets through --stage, but why
not other ones like "--deleted"?).

So I dunno. I would have done it more like:

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 03c283e..2a3d65c 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -31,6 +31,7 @@ static int debug_mode;
 static int show_eol;
 static const char *submodule_prefix;
 static int recurse_submodules;
+static struct argv_array recurse_argv = ARGV_ARRAY_INIT;
 
 static const char *prefix;
 static int max_prefix_len;
@@ -170,6 +171,32 @@ static void show_killed_files(struct dir_struct *dir)
 	}
 }
 
+/*
+ * Copy all of the parent arguments, but omit --submodule-prefix, as
+ * we will be adding our own. Other arguments are assumed to behave
+ * reasonably in the submodule, or to be blocked explicitly in
+ * cmd_ls_files().
+ *
+ * We have to make a full copy here, because parse_options will munge our
+ * original.
+ */
+static void store_recurse_argv(const char **argv)
+{
+	for (; *argv; argv++) {
+		const char *arg = *argv;
+
+		/* Yikes, we're reinventing option parsing here. */
+		if (starts_with(arg, "--submodule-prefix="))
+			continue;
+		if (!strcmp(arg, "--submodule-prefix")) {
+			if (argv[1])
+				argv++;
+			continue;
+		}
+		argv_array_push(&recurse_argv, arg);
+	}
+}
+
 /**
  * Recursively call ls-files on a submodule
  */
@@ -177,9 +204,10 @@ static void show_gitlink(const struct cache_entry *ce)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	int status;
+	int i;
 
-	argv_array_push(&cp.args, "ls-files");
-	argv_array_push(&cp.args, "--recurse-submodules");
+	for (i = 0; i < recurse_argv.argc; i++)
+		argv_array_push(&cp.args, recurse_argv.argv[i]);
 	argv_array_pushf(&cp.args, "--submodule-prefix=%s%s/",
 			 submodule_prefix ? submodule_prefix : "",
 			 ce->name);
@@ -535,6 +563,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (read_cache() < 0)
 		die("index file corrupt");
 
+	store_recurse_argv(argv);
+
 	argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
 			ls_files_usage, 0);
 	el = add_exclude_list(&dir, EXC_CMDL, "--exclude option");
@@ -565,13 +595,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	if (require_work_tree && !is_inside_work_tree())
 		setup_work_tree();
 
-	if (recurse_submodules &&
-	    (show_stage || show_deleted || show_others || show_unmerged ||
-	     show_killed || show_modified || show_resolve_undo ||
-	     show_valid_bit || show_tag || show_eol))
-		die("ls-files --recurse-submodules can only be used in "
-		    "--cached mode");
-
 	if (recurse_submodules && error_unmatch)
 		die("ls-files --recurse-submodules does not support "
 		    "--error-unmatch");

which I admit is somewhat ugly, due to the "yikes" comment above. For
example, I think parse_options() will actually allow a unique prefix
like "--submodule-pre=foo/", which we would not catch. Or worse, we
would accidentally munge the pathspec in:

  git ls-files --recurse-submodules -- --submodule-prefix=foo

(or any case where a non-option looked like "--submodule-prefix=".
Obviously those are pathological, but it's still nasty.

I think we _could_ actually avoid picking out the --submodule-prefix
argument entirely, and instead just add our own at the end to override
it.

I don't think it actually matters that much here.  You _could_ get away
with even leaving the old --submodule-prefix in place, and just adding
our new one at the end to override it. That works, though it does mean
that your argv gets continually longer as you recurse modules. Or I
suppose another way would be to make the prefix option additive, and
then just literally add "--prefix-add=$name" at each level. The option
parser would then build up the real prefix from multiple prefix-add
options.

The final thing I could think of is that we could teach parse_options()
to record a canonicalized copy of all of the options in an argv_array.
So it would normalize "--submodule-pre foo" to "--submodule-prefix=foo",
and that makes our after-the-fact parsing much easier.

But like I said, I dunno. I'm on the fence on which approach is the
least ugly (including your original, to just forward specific options).
They all have their warts. :)

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]