Re: [PATCH v4 3/8] for-each-repo: run subcommands on configured repos

Eric Sunshine <sunshine@xxxxxxxxxxxxx> · Mon, 3 May 2021 13:01:42 -0400

I'm not Stolee and hadn't even looked at this code before, but I'll
jump in with a couple comments...

On Mon, May 3, 2021 at 12:11 PM Andrzej Hunt <andrzej@xxxxxxxxx> wrote:
> On 15/10/2020 19:21, Derrick Stolee via GitGitGadget wrote:
> > +static int run_command_on_repo(const char *path,
> > +                            void *cbdata)
> > +{
> > +     int i;
> > +     struct child_process child = CHILD_PROCESS_INIT;
> > +     struct strvec *args = (struct strvec *)cbdata;
>
> I was curious there's a strong reason for declaring args as void *
> followed by this cast? The most obvious answer seems to be that this
> probably evolved from a callback - and we could simplify it now?

Agreed, the `void*` cbdata doesn't make sense here.

> > +     strvec_pushl(&child.args, "-C", path, NULL);
> > +
> > +     for (i = 0; i < args->nr; i++)
> > +             strvec_push(&child.args, args->v[i]);
>
> So here we're copying all of args - and I don't see any way of avoiding
> it since we're adding it to child's arg list.

... (dot dot dot)

> > +int cmd_for_each_repo(int argc, const char **argv, const char *prefix)
> > +{
> > +     struct strvec args = STRVEC_INIT;
> > +     for (i = 0; i < argc; i++)
> > +             strvec_push(&args, argv[i]);
>
> But why do we have to copy all of argv 1:1 into args here, only to later
> pass it to run_command_on_repo() which, as described above, copies the
> entire input again? I suspect this was done to comply with
> run_command_on_repo()'s API (which takes strvec) - does that seem
> plausible, or did I miss something?
>
> Which brings me to the real reason for my questions: I noticed we "leak"
> args (this leak is of no significance since it happens in cmd_*, but
> LSAN still complains, and I'm trying to get tests running leak-free). My
> initial inclination was to strvec_clear() or UNLEAK() args - but if we
> can avoid creating args in the first place we also wouldn't need to
> clear it later.
>
> My current proposal is therefore to completely remove args and pass
> argc+argv into run_command_on_repo() - but I wanted to be sure that I
> didn't miss some important reason to stick with the current approach.

An alternative to all this copying would be to take advantage of
child_process.argv which is owned by the caller, thus does not get
freed automatically by run_command(). This would allow you to re-use
the same argument vector for all calls. And you don't need
run_command_on_repo() at all. Something like this in
cmd_for_each_repo(), untested and typed directly in email:

    struct child_process child = CHILD_PROCESS_INIT;

    for (i = 0; !result && i < values->nr; i++) {
        const char *d = chdir(values->items[i].string);
        if (chdir(d))
            die_errno(_("cannot chdir to '%s'"), d);
        child.git_cmd = 1;
        child.argv = argv;
        result = run_command(&child);
    }

This assumes that argv[] is correctly NULL-terminated after
parse_options() -- I didn't check, but expect it to be so. If not,
it's easy enough to copy argv[] into `args` once and then
strvec_clear(&args) at the end of the function.

The one downside is that trace output wouldn't be as helpful (I think)
because you wouldn't see an explicit "-C <dir>", but I suppose the
tracing machinery can be invoked manually to address this.