Re: [PATCH v4 3/8] for-each-repo: run subcommands on configured repos

Andrzej Hunt <andrzej@xxxxxxxxx> · Mon, 3 May 2021 18:10:47 +0200

On 15/10/2020 19:21, Derrick Stolee via GitGitGadget wrote:
From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
[... snip ...]

diff --git a/builtin/for-each-repo.c b/builtin/for-each-repo.c
new file mode 100644
index 0000000000..5bba623ff1
--- /dev/null
+++ b/builtin/for-each-repo.c
@@ -0,0 +1,58 @@
+#include "cache.h"
+#include "config.h"
+#include "builtin.h"
+#include "parse-options.h"
+#include "run-command.h"
+#include "string-list.h"
+
+static const char * const for_each_repo_usage[] = {
+	N_("git for-each-repo --config=<config> <command-args>"),
+	NULL
+};
+
+static int run_command_on_repo(const char *path,
+			       void *cbdata)
+{
+	int i;
+	struct child_process child = CHILD_PROCESS_INIT;
+	struct strvec *args = (struct strvec *)cbdata;

I was curious there's a strong reason for declaring args as void * 
followed by this cast? The most obvious answer seems to be that this 
probably evolved from a callback - and we could simplify it now?
+
+	child.git_cmd = 1;
+	strvec_pushl(&child.args, "-C", path, NULL);
+
+	for (i = 0; i < args->nr; i++)
+		strvec_push(&child.args, args->v[i]);
So here we're copying all of args - and I don't see any way of avoiding 
it since we're adding it to child's arg list.

+
+	return run_command(&child);
+}
+
+int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
+{
+	static const char *config_key = NULL;
+	int i, result = 0;
+	const struct string_list *values;
+	struct strvec args = STRVEC_INIT;
+
+	const struct option options[] = {
+		OPT_STRING(0, "config", &config_key, N_("config"),
+			   N_("config key storing a list of repository paths")),
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, for_each_repo_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	if (!config_key)
+		die(_("missing --config=<config>"));
+
+	for (i = 0; i < argc; i++)
+		strvec_push(&args, argv[i]);

But why do we have to copy all of argv 1:1 into args here, only to later 
pass it to run_command_on_repo() which, as described above, copies the 
entire input again? I suspect this was done to comply with 
run_command_on_repo()'s API (which takes strvec) - does that seem 
plausible, or did I miss something?

Which brings me to the real reason for my questions: I noticed we "leak" 
args (this leak is of no significance since it happens in cmd_*, but 
LSAN still complains, and I'm trying to get tests running leak-free). My 
initial inclination was to strvec_clear() or UNLEAK() args - but if we 
can avoid creating args in the first place we also wouldn't need to 
clear it later.

My current proposal is therefore to completely remove args and pass 
argc+argv into run_command_on_repo() - but I wanted to be sure that I 
didn't miss some important reason to stick with the current approach.

+
+	values = repo_config_get_value_multi(the_repository,
+					     config_key);
+
+	for (i = 0; !result && i < values->nr; i++)
+		result = run_command_on_repo(values->items[i].string, &args);
+
+	return result;
+}
[... snip ...]

(I hope this doesn't come across as useless necroposting - I figured it 
would be easier to clarify these questions on the original thread as 
opposed to potentially discussing it as part of my next leak-fixing 
series :) .)

ATB,

  Andrzej