Re: [PATCH 2/5] t/helper/test-hashmap.c: avoid using `strtok()`

Jeff King <peff@xxxxxxxx> · Tue, 18 Apr 2023 06:23:20 -0400

On Thu, Apr 13, 2023 at 07:31:46PM -0400, Taylor Blau wrote:

> Avoid using the non-reentrant `strtok()` to separate the parts of each
> incoming command. Instead of replacing it with `strtok_r()`, let's
> instead use the more friendly `string_list_split_in_place_multi()`.

Junio mentioned this offhand in his response, but I wanted to highlight
one difference in before/after behavior here.

  [before]
  $ printf 'add foo    bar\niterate\n' | t/helper/test-tool hashmap
  foo bar

  [after]
  $ printf 'add foo    bar\niterate\n' | t/helper/test-tool hashmap
  foo    bar

I think that's fine for this test script, but it may be an indication
that we want string-list's split to support different semantics.

> @@ -159,21 +161,34 @@ int cmd__hashmap(int argc, const char **argv)
>  
>  	/* process commands from stdin */
>  	while (strbuf_getline(&line, stdin) != EOF) {
> -		char *cmd, *p1 = NULL, *p2 = NULL;
> +		char *cmd, *p1, *p2;
>  		unsigned int hash = 0;
>  		struct test_entry *entry;
>  
> +		/*
> +		 * Because we memdup() the arguments out of the
> +		 * string_list before inserting them into the hashmap,
> +		 * it's OK to set its length back to zero to avoid
> +		 * re-allocating the items array once per line.
> +		 *
> +		 * By doing so, we'll instead overwrite the existing
> +		 * entries and avoid re-allocating.
> +		 */
> +		parts.nr = 0;

I think this is OK, but I wonder if we should wrap it in a function that
makes sure the strings aren't owned by the strdup (and thus aren't being
leaked). Something like:

  void string_list_setlen(struct string_list *sl, size_t nr)
  {
	/* alternatively, I guess we could actually free nr..sl->nr */
	if (sl->strdup_strings)
		BUG("you can't setlen on a string-list which owns its strings");
	if (nr > sl->nr)
		BUG("you can't grow a string-list with setlen");
	sl->nr = nr;
  }

That is probably overkill for these two test helpers, but I wonder if
the pattern might escape into "real" code (especially if we suggest
using string-list instead of strtok).

-Peff