Re: [PATCH v2 05/10] strmap: new utility functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 13, 2020 at 12:40:45AM +0000, Elijah Newren via GitGitGadget wrote:

> Add strmap as a new struct and associated utility functions,
> specifically for hashmaps that map strings to some value.  The API is
> taken directly from Peff's proposal at
> https://lore.kernel.org/git/20180906191203.GA26184@xxxxxxxxxxxxxxxxxxxxx/

This looks overall sane to me. I mentioned elsewhere that we could be
using FLEXPTR_ALLOC to save an extra allocation. I think it's easy and
worth doing here, as the logic would be completely contained within
strmap_put():

  if (strdup_strings)
	FLEXPTR_ALLOC_STR(entry, key, str);
  else {
	entry = xmalloc(sizeof(*entry));
	entry->key = str;
  }

And free_entries() then doesn't even have to care about strdup_strings.

> A couple of items of note:
> 
>   * Similar to string-list, I have a strdup_strings setting.  However,
>     unlike string-list, strmap_init() does not take a parameter for this
>     setting and instead automatically sets it to 1; callers who want to
>     control this detail need to instead call strmap_ocd_init().

That seems reasonable. It could just be a parameter, but I like that you
push people in the direction of doing the simple and safe thing, rather
than having them wonder whether they ought to set strdup_strings or not.

>   * I do not have a STRMAP_INIT macro.  I could possibly add one, but
>       #define STRMAP_INIT { { NULL, cmp_str_entry, NULL, 0, 0, 0, 0, 0 }, 1 }
>     feels a bit unwieldy and possibly error-prone in terms of future
>     expansion of the hashmap struct.  The fact that cmp_str_entry needs to
>     be in there prevents us from passing all zeros for the hashmap, and makes
>     me worry that STRMAP_INIT would just be more trouble than it is worth.

You can actually omit everything after cmp_str_entry, and those fields
would all get zero-initialized. But we also allow C99 designed
initializers these days. Coupled with the HASHMAP_INIT() I mentioned in
the earlier email, you'd have:

  #define STRMAP_INIT { \
		.map = HASHMAP_INIT(cmp_strmap_entry, NULL), \
		.strdup_strings = 1, \
	  }

which seems pretty maintainable.

> +static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
> +			    const struct hashmap_entry *entry1,
> +			    const struct hashmap_entry *entry2,
> +			    const void *keydata)
> +{
> +	const struct strmap_entry *e1, *e2;
> +
> +	e1 = container_of(entry1, const struct strmap_entry, ent);
> +	e2 = container_of(entry2, const struct strmap_entry, ent);
> +	return strcmp(e1->key, e2->key);
> +}

I expected to use keydata here, but it's pretty easy to make a fake
strmap_entry because of the use of the "key" pointer. So that makes
sense.

> +static void strmap_free_entries_(struct strmap *map, int free_util)

You use the term "value" for the mapped-to value in this iteration. So
perhaps free_values here (and in other functions) would be a better
name?

> +	/*
> +	 * We need to iterate over the hashmap entries and free
> +	 * e->key and e->value ourselves; hashmap has no API to
> +	 * take care of that for us.  Since we're already iterating over
> +	 * the hashmap, though, might as well free e too and avoid the need
> +	 * to make some call into the hashmap API to do that.
> +	 */
> +	hashmap_for_each_entry(&map->map, &iter, e, ent) {
> +		if (free_util)
> +			free(e->value);
> +		if (map->strdup_strings)
> +			free((char*)e->key);
> +		free(e);
> +	}
> +}

Yep, makes sense.

> +void strmap_clear(struct strmap *map, int free_util)
> +{
> +	strmap_free_entries_(map, free_util);
> +	hashmap_free(&map->map);
> +}

This made me wonder about a partial_clear(), but it looks like that
comes later.

> +void *strmap_put(struct strmap *map, const char *str, void *data)
> +{
> +	struct strmap_entry *entry = find_strmap_entry(map, str);
> +	void *old = NULL;
> +
> +	if (entry) {
> +		old = entry->value;
> +		entry->value = data;

Here's a weird hypothetical. If strdup_strings is not set and I do:

  const char *one = xstrdup("foo");
  const char *two = xstrdup("foo");

  hashmap_put(map, one, x);
  hashmap_put(map, two, y);

it's clear that the value should be pointing to "y" afterwards (and you
return "x" so the caller can free it or whatever, good).

But which key should the entry be pointing to? The old one or the new
one? I'm trying and failing to think of a case where it would matter.
Certainly I could add a free() to the toy above where it would, but it
feels like a real caller would have to have pretty convoluted memory
lifetime semantics for it to make a difference.

So I'm not really asking for a particular behavior, but just bringing it
up in case you can think of something relevant.

> +	} else {
> +		/*
> +		 * We won't modify entry->key so it really should be const.
> +		 */
> +		const char *key = str;

The "should be" here confused me. It _is_ const. I'd probably just
delete the comment entirely, but perhaps:

  /*
   * We'll store a const pointer. For non-duplicated strings, they belong
   * to the caller and we received them as const in the first place. For
   * our duplicated ones, they do point to memory we own, but they're
   * still conceptually constant within the lifetime of an entry.
   */

Though it might make more sense in the struct definition, not here.

> +void *strmap_get(struct strmap *map, const char *str)
> +{
> +	struct strmap_entry *entry = find_strmap_entry(map, str);
> +	return entry ? entry->value : NULL;
> +}

Just noting that the caller can't tell the difference between "no such
entry" and "the entry is storing NULL". I think the simplicity offered
by this interface makes it worth having (and being the primary one). If
some caller really needs to tell the difference between the two, we can
add another function later.

Obviously they could use strmap_contains(), but that would mean two hash
lookups.

> +/*
> + * Same as strmap_init, but for those who want to control the memory management
> + * carefully instead of using the default of strdup_strings=1.
> + * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
> + * are obsessing over minor details.)
> + */
> +void strmap_ocd_init(struct strmap *map,
> +		     int strdup_strings);

I'm not personally bothered by this name, but I wonder if some people
may be (because they have or know somebody who actually has OCD).

Perhaps strmap_init_with_options() would be a better name? It likewise
would extend well if we want to add other non-default options later.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux