On Mon, Nov 22 2021, Jeff King wrote: > On Mon, Nov 22, 2021 at 05:04:07PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> This change is larger than I'd like, but there's no easy way to avoid >> it that wouldn't involve even more verbose intermediate steps. We use >> the "argv" as the source of truth over the "args", so we need to >> change all parts of run-command.[ch] itself, as well as the trace2 >> logging at the same time. > > Yeah, agreed most of this needs to come in a bundle. But at least few spots > could be split out into the earlier patches: > >> builtin/worktree.c | 2 -- >> t/helper/test-run-command.c | 10 +++++--- > > as they are just users of the API. Will split these out, I had test-run-command.c split initially, but squashed it locally, which did away with having to explain leaving this area untested as an intermediate step. But will split again. >> diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c >> index 3c4fb862234..b5519f92a19 100644 >> --- a/t/helper/test-run-command.c >> +++ b/t/helper/test-run-command.c >> [...] >> @@ -396,7 +396,8 @@ int cmd__run_command(int argc, const char **argv) >> } >> if (argc < 3) >> return 1; >> - proc.argv = (const char **)argv + 2; >> + strvec_clear(&proc.args); >> + strvec_pushv(&proc.args, (const char **)argv + 2); >> >> if (!strcmp(argv[1], "start-command-ENOENT")) { >> if (start_command(&proc) < 0 && errno == ENOENT) >> @@ -408,7 +409,8 @@ int cmd__run_command(int argc, const char **argv) >> exit(run_command(&proc)); >> >> jobs = atoi(argv[2]); >> - proc.argv = (const char **)argv + 3; >> + strvec_clear(&proc.args); >> + strvec_pushv(&proc.args, (const char **)argv + 3); >> >> if (!strcmp(argv[1], "run-command-parallel")) >> exit(run_processes_parallel(jobs, parallel_next, > > These two in particular are tricky. This second strvec_clear() is > necessary because we are overwriting what was put into proc.args by the > earlier strvec_pushv(). I think this is one of two interesting ways > these kinds of trivial conversions can fail: > > - somebody is relying on the fact that "argv = whatever" is an > assignment, not an append (which is the case here) > > - somebody is relying on the fact that assignment of the pointer is > shallow, whereas strvec_push() is doing a deep copy. So converting: > > cp.argv = argv; > argv[1] = "foo"; > > to: > > strvec_pushv(&cp.args, argv); > argv[1] = "foo"; > > is not correct. We wouldn't use the updated "foo". I didn't notice > any cases like this during my rough own rough conversion, but they > could be lurking. Yes, I tried to eyeball all the code involved & see if there were any. I could amend this to start renaming variables, but that'll be a much larger change. > The strvec_clear() in the first hunk above is also not necessary (nobody > has written anything before it), but I don't mind it for consistency / > being defensive. *nod*, will remove. >> @@ -902,8 +900,9 @@ int start_command(struct child_process *cmd) >> #else >> { >> int fhin = 0, fhout = 1, fherr = 2; >> - const char **sargv = cmd->argv; >> + const char **sargv = strvec_detach(&cmd->args); >> struct strvec nargv = STRVEC_INIT; >> + const char **temp_argv = NULL; > > I wondered about detaching here, because other parts of the code > (finish_command(), tracing, etc) will be looking at cmd->args.v[0] for > the command name. > > But we eventually overwrite it with munged results: > >> @@ -929,20 +928,26 @@ int start_command(struct child_process *cmd) >> fhout = dup(cmd->out); >> >> if (cmd->git_cmd) >> - cmd->argv = prepare_git_cmd(&nargv, cmd->argv); >> + temp_argv = prepare_git_cmd(&nargv, sargv); >> else if (cmd->use_shell) >> - cmd->argv = prepare_shell_cmd(&nargv, cmd->argv); >> + temp_argv = prepare_shell_cmd(&nargv, sargv); >> + else >> + temp_argv = sargv; >> + if (!temp_argv) >> + BUG("should have some cmd->args to set"); >> + strvec_pushv(&cmd->args, temp_argv); >> + strvec_clear(&nargv); > > So we have to do some replacement. I think the memory management gets > confusing here, though. > > At this point "temp_argv" might point to nargv.v (in which case it is a > dangling pointer after we clear nargv) or to the original sargv (in > which case it is memory owned by us that must be freed). The former is > OK, because we never look at it again. And the latter we eventually > reattach into &cmd->args, but: > >> - strvec_clear(&nargv); >> - cmd->argv = sargv; >> + strvec_pushv(&cmd->args, sargv);; >> + free(sargv); > > I think we still have all the entries we pushed into cmd->args from > temp_argv earlier. So we'd need to strvec_clear() it before pushing > sargv again. > > And then the free(sargv) is too shallow. It's just freeing the outer > array, but sargv[0], etc, are leaked. I'll try to fix that, but updates to this part are very painful, since I've needed to wait 30m for each change in the Windows CI. > I think what you really want is the equivalent of strvec_attach(). We > don't have that, but it's basically just: > > void strvec_attach(struct strvec *s, const char **v) > { > /* > * this is convenient for our caller, but if we wanted to find > * potential misuses, we could also BUG() if s.nr > 0 > */ > strvec_clear(&s); > > /* take over ownership */ > s->v = v; > > /* v must be NULL-terminated; count up to set number */ > s->nr = 0; > for (; *v; v++) > s->nr++; > /* > * we don't know how big the original allocation was, so we can > * assume only nr. But add 1 to account for the NULL. > */ > s->alloc = s->nr + 1; > } > > I do think this whole thing would be easier to read if we left cmd->argv > intact, and just operated on a separate argv strvec. That's what the > non-Windows side does. But that distinction is nothing new in your > patch, and I'm not sure if there is a reason that the Windows code does > it differently. I did have this with a strvec_attach() locally, but figured I'd get comments about growing scope & with just one caller. This version seems to be duplicating things in the existing API though, I just had the below, which I think works just as well without the duplication. Perhaps you missed strvec_push_nodup()? diff --git a/strvec.c b/strvec.c index 61a76ce6cb9..c10008d792f 100644 --- a/strvec.c +++ b/strvec.c @@ -106,3 +106,9 @@ const char **strvec_detach(struct strvec *array) return ret; } } + +void strvec_attach(struct strvec *array, const char **items) +{ + for (; *items; items++) + strvec_push_nodup(array, *items); +}