I'm not Stolee and hadn't even looked at this code before, but I'll jump in with a couple comments... On Mon, May 3, 2021 at 12:11 PM Andrzej Hunt <andrzej@xxxxxxxxx> wrote: > On 15/10/2020 19:21, Derrick Stolee via GitGitGadget wrote: > > +static int run_command_on_repo(const char *path, > > + void *cbdata) > > +{ > > + int i; > > + struct child_process child = CHILD_PROCESS_INIT; > > + struct strvec *args = (struct strvec *)cbdata; > > I was curious there's a strong reason for declaring args as void * > followed by this cast? The most obvious answer seems to be that this > probably evolved from a callback - and we could simplify it now? Agreed, the `void*` cbdata doesn't make sense here. > > + strvec_pushl(&child.args, "-C", path, NULL); > > + > > + for (i = 0; i < args->nr; i++) > > + strvec_push(&child.args, args->v[i]); > > So here we're copying all of args - and I don't see any way of avoiding > it since we're adding it to child's arg list. ... (dot dot dot) > > +int cmd_for_each_repo(int argc, const char **argv, const char *prefix) > > +{ > > + struct strvec args = STRVEC_INIT; > > + for (i = 0; i < argc; i++) > > + strvec_push(&args, argv[i]); > > But why do we have to copy all of argv 1:1 into args here, only to later > pass it to run_command_on_repo() which, as described above, copies the > entire input again? I suspect this was done to comply with > run_command_on_repo()'s API (which takes strvec) - does that seem > plausible, or did I miss something? > > Which brings me to the real reason for my questions: I noticed we "leak" > args (this leak is of no significance since it happens in cmd_*, but > LSAN still complains, and I'm trying to get tests running leak-free). My > initial inclination was to strvec_clear() or UNLEAK() args - but if we > can avoid creating args in the first place we also wouldn't need to > clear it later. > > My current proposal is therefore to completely remove args and pass > argc+argv into run_command_on_repo() - but I wanted to be sure that I > didn't miss some important reason to stick with the current approach. An alternative to all this copying would be to take advantage of child_process.argv which is owned by the caller, thus does not get freed automatically by run_command(). This would allow you to re-use the same argument vector for all calls. And you don't need run_command_on_repo() at all. Something like this in cmd_for_each_repo(), untested and typed directly in email: struct child_process child = CHILD_PROCESS_INIT; for (i = 0; !result && i < values->nr; i++) { const char *d = chdir(values->items[i].string); if (chdir(d)) die_errno(_("cannot chdir to '%s'"), d); child.git_cmd = 1; child.argv = argv; result = run_command(&child); } This assumes that argv[] is correctly NULL-terminated after parse_options() -- I didn't check, but expect it to be so. If not, it's easy enough to copy argv[] into `args` once and then strvec_clear(&args) at the end of the function. The one downside is that trace output wouldn't be as helpful (I think) because you wouldn't see an explicit "-C <dir>", but I suppose the tracing machinery can be invoked manually to address this.