Re: [PATCH v2 05/10] strmap: new utility functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Oct 30, 2020 at 7:12 AM Jeff King <peff@xxxxxxxx> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:45AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Add strmap as a new struct and associated utility functions,
> > specifically for hashmaps that map strings to some value.  The API is
> > taken directly from Peff's proposal at
> > https://lore.kernel.org/git/20180906191203.GA26184@xxxxxxxxxxxxxxxxxxxxx/
>
> This looks overall sane to me. I mentioned elsewhere that we could be
> using FLEXPTR_ALLOC to save an extra allocation. I think it's easy and
> worth doing here, as the logic would be completely contained within
> strmap_put():
>
>   if (strdup_strings)
>         FLEXPTR_ALLOC_STR(entry, key, str);
>   else {
>         entry = xmalloc(sizeof(*entry));
>         entry->key = str;
>   }
>
> And free_entries() then doesn't even have to care about strdup_strings.

Yeah, as you noted in your review of 10/10 this idea wouldn't play
well with the later mem_pool changes.

> > A couple of items of note:
> >
> >   * Similar to string-list, I have a strdup_strings setting.  However,
> >     unlike string-list, strmap_init() does not take a parameter for this
> >     setting and instead automatically sets it to 1; callers who want to
> >     control this detail need to instead call strmap_ocd_init().
>
> That seems reasonable. It could just be a parameter, but I like that you
> push people in the direction of doing the simple and safe thing, rather
> than having them wonder whether they ought to set strdup_strings or not.

Well, in my first round of the series where I did make it a parameter
you balked pretty loudly.  ;-)

> >   * I do not have a STRMAP_INIT macro.  I could possibly add one, but
> >       #define STRMAP_INIT { { NULL, cmp_str_entry, NULL, 0, 0, 0, 0, 0 }, 1 }
> >     feels a bit unwieldy and possibly error-prone in terms of future
> >     expansion of the hashmap struct.  The fact that cmp_str_entry needs to
> >     be in there prevents us from passing all zeros for the hashmap, and makes
> >     me worry that STRMAP_INIT would just be more trouble than it is worth.
>
> You can actually omit everything after cmp_str_entry, and those fields
> would all get zero-initialized. But we also allow C99 designed
> initializers these days. Coupled with the HASHMAP_INIT() I mentioned in
> the earlier email, you'd have:
>
>   #define STRMAP_INIT { \
>                 .map = HASHMAP_INIT(cmp_strmap_entry, NULL), \
>                 .strdup_strings = 1, \
>           }
>
> which seems pretty maintainable.

Makes sense; will add.

> > +static int cmp_strmap_entry(const void *hashmap_cmp_fn_data,
> > +                         const struct hashmap_entry *entry1,
> > +                         const struct hashmap_entry *entry2,
> > +                         const void *keydata)
> > +{
> > +     const struct strmap_entry *e1, *e2;
> > +
> > +     e1 = container_of(entry1, const struct strmap_entry, ent);
> > +     e2 = container_of(entry2, const struct strmap_entry, ent);
> > +     return strcmp(e1->key, e2->key);
> > +}
>
> I expected to use keydata here, but it's pretty easy to make a fake
> strmap_entry because of the use of the "key" pointer. So that makes
> sense.
>
> > +static void strmap_free_entries_(struct strmap *map, int free_util)
>
> You use the term "value" for the mapped-to value in this iteration. So
> perhaps free_values here (and in other functions) would be a better
> name?

Oops, yes, definitely.

> > +     /*
> > +      * We need to iterate over the hashmap entries and free
> > +      * e->key and e->value ourselves; hashmap has no API to
> > +      * take care of that for us.  Since we're already iterating over
> > +      * the hashmap, though, might as well free e too and avoid the need
> > +      * to make some call into the hashmap API to do that.
> > +      */
> > +     hashmap_for_each_entry(&map->map, &iter, e, ent) {
> > +             if (free_util)
> > +                     free(e->value);
> > +             if (map->strdup_strings)
> > +                     free((char*)e->key);
> > +             free(e);
> > +     }
> > +}
>
> Yep, makes sense.
>
> > +void strmap_clear(struct strmap *map, int free_util)
> > +{
> > +     strmap_free_entries_(map, free_util);
> > +     hashmap_free(&map->map);
> > +}
>
> This made me wonder about a partial_clear(), but it looks like that
> comes later.
>
> > +void *strmap_put(struct strmap *map, const char *str, void *data)
> > +{
> > +     struct strmap_entry *entry = find_strmap_entry(map, str);
> > +     void *old = NULL;
> > +
> > +     if (entry) {
> > +             old = entry->value;
> > +             entry->value = data;
>
> Here's a weird hypothetical. If strdup_strings is not set and I do:
>
>   const char *one = xstrdup("foo");
>   const char *two = xstrdup("foo");
>
>   hashmap_put(map, one, x);
>   hashmap_put(map, two, y);
>
> it's clear that the value should be pointing to "y" afterwards (and you
> return "x" so the caller can free it or whatever, good).
>
> But which key should the entry be pointing to? The old one or the new
> one? I'm trying and failing to think of a case where it would matter.
> Certainly I could add a free() to the toy above where it would, but it
> feels like a real caller would have to have pretty convoluted memory
> lifetime semantics for it to make a difference.
>
> So I'm not really asking for a particular behavior, but just bringing it
> up in case you can think of something relevant.

I'll keep mulling it over, but I likewise can't currently think of a
case where it'd matter.

>
> > +     } else {
> > +             /*
> > +              * We won't modify entry->key so it really should be const.
> > +              */
> > +             const char *key = str;
>
> The "should be" here confused me. It _is_ const. I'd probably just
> delete the comment entirely, but perhaps:
>
>   /*
>    * We'll store a const pointer. For non-duplicated strings, they belong
>    * to the caller and we received them as const in the first place. For
>    * our duplicated ones, they do point to memory we own, but they're
>    * still conceptually constant within the lifetime of an entry.
>    */
>
> Though it might make more sense in the struct definition, not here.

Either I was (mistakenly) worried about "I'm going to allocate and
copy, but during the copy it isn't actually const", or this is a
leftover artifact from some of the other iterations I tried.  Anyway,
I think this comment isn't useful; I'll just strike it.

> > +void *strmap_get(struct strmap *map, const char *str)
> > +{
> > +     struct strmap_entry *entry = find_strmap_entry(map, str);
> > +     return entry ? entry->value : NULL;
> > +}
>
> Just noting that the caller can't tell the difference between "no such
> entry" and "the entry is storing NULL". I think the simplicity offered
> by this interface makes it worth having (and being the primary one). If
> some caller really needs to tell the difference between the two, we can
> add another function later.
>
> Obviously they could use strmap_contains(), but that would mean two hash
> lookups.

Yep, addressed later by strmap_get_entry() in another patch, as you
noticed in your later review.

> > +/*
> > + * Same as strmap_init, but for those who want to control the memory management
> > + * carefully instead of using the default of strdup_strings=1.
> > + * (OCD = Obsessive Compulsive Disorder, a joke that those who use this function
> > + * are obsessing over minor details.)
> > + */
> > +void strmap_ocd_init(struct strmap *map,
> > +                  int strdup_strings);
>
> I'm not personally bothered by this name, but I wonder if some people
> may be (because they have or know somebody who actually has OCD).
>
> Perhaps strmap_init_with_options() would be a better name? It likewise
> would extend well if we want to add other non-default options later.

Doh!  That's going to push a bunch of lines past 80 characters.  Sigh...

It's probably a better name though; I'll change it.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux