Re: [PATCH v3 4/4] stash: implement builtin stash

Jeff King <peff@xxxxxxxx> · Thu, 1 Jun 2017 00:07:17 -0400

On Wed, May 31, 2017 at 08:29:43PM -0700, Joel Teichroeb wrote:

> I'm running into a lot of trouble using argv_array_clear. It seems
> that some of the builtin git cmd functions move the parameters around,
> and write new pointers to argv. There's three options I have now, and
> I'm not sure which is the best one.

Hrm. It's normal for parsing to reorder the parameters (e.g., shifting
non-options to the front), but that should still allow a clear at the
end. New pointers would definitely cause a problem, though. I don't know
of any cases where we do that, but on the other hand I wouldn't be too
surprised to find that the revision.c options parser does some nasty
tricks.

Do you have a specific example? I'd be curious to see if we can just fix
the parser to be less surprising (i.e., your (1) below).

> 1. Fix all the builtin cmd functions that I use to not mess around with argv

If it's just one or two spots, this might be viable.

> 2. Stop using the builtin cmd functions, and use child processes exclusively

That might not be the worst thing in the world for a first cut at a
shell to C transition, because it eliminates a whole class of possible
problems. But it really just side-steps the problem, as we'd want to
eventually deal with it and reduce the process count.

> 3. Don't worry about clearing the memory used for these function calls.

That might be do-able, as long as the leaks are O(1) for a program run
(and not say, a leak per commit). At the very least we should mark
those spots with a "NEEDSWORK" comment and an explanation of the issue
so that your work in finding them isn't wasted.

> It looks like the rest of the code generally does #3.

It looks like we don't actually pass argv arrays to setup_revisions()
all that often. The three I see are:

  - bisect_rev_setup(), which is a known leak. This is trickier, though,
    because we actually pass the initialized rev_info out of the
    function, and the memory needs to last until we're done with the
    traversal

  - http-push, which does seem to free the memory

  - stat_tracking_info(), which does seem to free

I could well believe there are places where we leak, though, especially
for top-level functions that exit the program when they're done.

A fourth option is to massage the argv array into something that can be
massaged by the callee, and retain the original array for freeing. I.e.,
something like:

  struct argv_array argv = ARGV_ARRAY_INIT;
  const char **massaged;

  argv_array_pushl(&argv, ...whatever...);

  ALLOC_ARRAY(massaged, argc);
  COPY_ARRAY(massaged, argv, argc);

  setup_revisions(argv.argc, massaged, &revs, NULL);

  /*
   * No clue what's in "massaged" now, as setup_revisions() may have
   * reordered things, added new elements, deleted some, etc. But we
   * don't have to care because any pointers we need to free are still
   * in the original argv struct, and we should be safe to free the
   * massaged array itself.
   */
  free(massaged);
  argv_array_clear(&argv);

That's pretty horrible, though. If setup_revisions() is requiring us to
do that, I'd really prefer to look into fixing it.

-Peff