Re: [PATCH 5/5] run-command API: remove "argv" member, always use "args"

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 22 Nov 2021 19:19:14 +0100

On Mon, Nov 22 2021, Jeff King wrote:

> On Mon, Nov 22, 2021 at 05:04:07PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This change is larger than I'd like, but there's no easy way to avoid
>> it that wouldn't involve even more verbose intermediate steps. We use
>> the "argv" as the source of truth over the "args", so we need to
>> change all parts of run-command.[ch] itself, as well as the trace2
>> logging at the same time.
>
> Yeah, agreed most of this needs to come in a bundle. But at least few spots
> could be split out into the earlier patches:
>
>>  builtin/worktree.c          |  2 --
>>  t/helper/test-run-command.c | 10 +++++---
>
> as they are just users of the API.

Will split these out, I had test-run-command.c split initially, but
squashed it locally, which did away with having to explain leaving this
area untested as an intermediate step. But will split again.

>> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c
>> index 3c4fb862234..b5519f92a19 100644
>> --- a/t/helper/test-run-command.c
>> +++ b/t/helper/test-run-command.c
>> [...]
>> @@ -396,7 +396,8 @@ int cmd__run_command(int argc, const char **argv)
>>  	}
>>  	if (argc < 3)
>>  		return 1;
>> -	proc.argv = (const char **)argv + 2;
>> +	strvec_clear(&proc.args);
>> +	strvec_pushv(&proc.args, (const char **)argv + 2);
>>  
>>  	if (!strcmp(argv[1], "start-command-ENOENT")) {
>>  		if (start_command(&proc) < 0 && errno == ENOENT)
>> @@ -408,7 +409,8 @@ int cmd__run_command(int argc, const char **argv)
>>  		exit(run_command(&proc));
>>  
>>  	jobs = atoi(argv[2]);
>> -	proc.argv = (const char **)argv + 3;
>> +	strvec_clear(&proc.args);
>> +	strvec_pushv(&proc.args, (const char **)argv + 3);
>>  
>>  	if (!strcmp(argv[1], "run-command-parallel"))
>>  		exit(run_processes_parallel(jobs, parallel_next,
>
> These two in particular are tricky. This second strvec_clear() is
> necessary because we are overwriting what was put into proc.args by the
> earlier strvec_pushv(). I think this is one of two interesting ways
> these kinds of trivial conversions can fail:
>
>   - somebody is relying on the fact that "argv = whatever" is an
>     assignment, not an append (which is the case here)
>
>   - somebody is relying on the fact that assignment of the pointer is
>     shallow, whereas strvec_push() is doing a deep copy. So converting:
>
>       cp.argv = argv;
>       argv[1] = "foo";
>
>     to:
>
>       strvec_pushv(&cp.args, argv);
>       argv[1] = "foo";
>
>     is not correct. We wouldn't use the updated "foo". I didn't notice
>     any cases like this during my rough own rough conversion, but they
>     could be lurking.

Yes, I tried to eyeball all the code involved & see if there were any. I
could amend this to start renaming variables, but that'll be a much
larger change.

> The strvec_clear() in the first hunk above is also not necessary (nobody
> has written anything before it), but I don't mind it for consistency /
> being defensive.

*nod*, will remove.

>> @@ -902,8 +900,9 @@ int start_command(struct child_process *cmd)
>>  #else
>>  {
>>  	int fhin = 0, fhout = 1, fherr = 2;
>> -	const char **sargv = cmd->argv;
>> +	const char **sargv = strvec_detach(&cmd->args);
>>  	struct strvec nargv = STRVEC_INIT;
>> +	const char **temp_argv = NULL;
>
> I wondered about detaching here, because other parts of the code
> (finish_command(), tracing, etc) will be looking at cmd->args.v[0] for
> the command name.
>
> But we eventually overwrite it with munged results:
>
>> @@ -929,20 +928,26 @@ int start_command(struct child_process *cmd)
>>  		fhout = dup(cmd->out);
>>  
>>  	if (cmd->git_cmd)
>> -		cmd->argv = prepare_git_cmd(&nargv, cmd->argv);
>> +		temp_argv = prepare_git_cmd(&nargv, sargv);
>>  	else if (cmd->use_shell)
>> -		cmd->argv = prepare_shell_cmd(&nargv, cmd->argv);
>> +		temp_argv = prepare_shell_cmd(&nargv, sargv);
>> +	else
>> +		temp_argv = sargv;
>> +	if (!temp_argv)
>> +		BUG("should have some cmd->args to set");
>> +	strvec_pushv(&cmd->args, temp_argv);
>> +	strvec_clear(&nargv);
>
> So we have to do some replacement. I think the memory management gets
> confusing here, though.
>
> At this point "temp_argv" might point to nargv.v (in which case it is a
> dangling pointer after we clear nargv) or to the original sargv (in
> which case it is memory owned by us that must be freed). The former is
> OK, because we never look at it again. And the latter we eventually
> reattach into &cmd->args, but:
>
>> -	strvec_clear(&nargv);
>> -	cmd->argv = sargv;
>> +	strvec_pushv(&cmd->args, sargv);;
>> +	free(sargv);
>
> I think we still have all the entries we pushed into cmd->args from
> temp_argv earlier. So we'd need to strvec_clear() it before pushing
> sargv again.
>
> And then the free(sargv) is too shallow. It's just freeing the outer
> array, but sargv[0], etc, are leaked.

I'll try to fix that, but updates to this part are very painful, since
I've needed to wait 30m for each change in the Windows CI.

> I think what you really want is the equivalent of strvec_attach(). We
> don't have that, but it's basically just:
>
>   void strvec_attach(struct strvec *s, const char **v)
>   {
> 	/*
> 	 * this is convenient for our caller, but if we wanted to find
> 	 * potential misuses, we could also BUG() if s.nr > 0
> 	 */
> 	strvec_clear(&s);
>
>         /* take over ownership */
> 	s->v = v;
>
> 	/* v must be NULL-terminated; count up to set number */
> 	s->nr = 0;
> 	for (; *v; v++)
> 		s->nr++;
> 	/*
> 	 * we don't know how big the original allocation was, so we can
> 	 * assume only nr. But add 1 to account for the NULL.
> 	 */
> 	s->alloc = s->nr + 1;
>   }
>
> I do think this whole thing would be easier to read if we left cmd->argv
> intact, and just operated on a separate argv strvec. That's what the
> non-Windows side does. But that distinction is nothing new in your
> patch, and I'm not sure if there is a reason that the Windows code does
> it differently.

I did have this with a strvec_attach() locally, but figured I'd get
comments about growing scope & with just one caller.

This version seems to be duplicating things in the existing API though,
I just had the below, which I think works just as well without the
duplication. Perhaps you missed strvec_push_nodup()?

diff --git a/strvec.c b/strvec.c
index 61a76ce6cb9..c10008d792f 100644
--- a/strvec.c
+++ b/strvec.c
@@ -106,3 +106,9 @@ const char **strvec_detach(struct strvec *array)
                return ret;
        }
 }
+
+void strvec_attach(struct strvec *array, const char **items)
+{
+       for (; *items; items++)
+               strvec_push_nodup(array, *items);
+}