Re: [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map

Elijah Newren <newren@xxxxxxxxx> · Fri, 30 Oct 2020 10:28:51 -0700

On Fri, Oct 30, 2020 at 7:39 AM Jeff King <peff@xxxxxxxx> wrote:
>
> On Tue, Oct 13, 2020 at 12:40:48AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > Although strmap could be used as a string->int map, one either had to
> > allocate an int for every entry and then deallocate later, or one had to
> > do a bunch of casting between (void*) and (intptr_t).
> >
> > Add some special functions that do the casting.  Also, rename put->set
> > for such wrapper functions since 'put' implied there may be some
> > deallocation needed if the string was already found in the map, which
> > isn't the case when we're storing an int value directly in the void*
> > slot instead of using the void* slot as a pointer to data.
>
> I think this is worth doing. That kind of casting is an implementation
> detail, and it's nice for callers not to have to see it.
>
> You might want to mention that this _could_ be done as just accessors to
> strmap, but using a separate struct provides type safety against
> misusing pointers as integers or vice versa.

If I just did it as accessors, it makes it harder for myself and
others to remember what my huge piles of strmaps in merge-ort do; I
found that it became easier to follow the code and remember what
things were doing when some were marked as strmap, some as strintmap,
and some as strset.

> > A note on the name: if anyone has a better name suggestion than
> > strintmap, I'm happy to take it.  It seems slightly unwieldy, but I have
> > not been able to come up with a better name.
>
> I still don't have a better suggestion on the name. Another convention
> could be to name map types as "map_from_to". So "struct map_str_int".
> But it's pretty ugly, and strmap would become "map_str_ptr" or
> something. As ugly as "strintmap" is, I like it better.
>
> > +void strintmap_incr(struct strintmap *map, const char *str, intptr_t amt)
> > +{
> > +     struct strmap_entry *entry = find_strmap_entry(&map->map, str);
> > +     if (entry) {
> > +             intptr_t *whence = (intptr_t*)&entry->value;
> > +             *whence += amt;
> > +     }
> > +     else
> > +             strintmap_set(map, str, amt);
> > +}
>
> Incrementing a missing entry auto-vivifies it at 0.  That makes perfect
> sense, but might be worth noting above the function in the header file.
>
> Though maybe it's a little weird since strintmap_get() takes a default
> value. Why don't we use that here? I'd have to see how its used, but
> would it make sense to set a default value when initializing the map,
> rather than providing it on each call?

That probably makes sense.  It turns out there is one strintmap for
which I call strintmap_get() in two different places with different
default values, but I think I can fix that up (one of them really
needed -1 as the default, while the other callsite just needed the
default to not accidentally match a specific enum value and 0 was
convenient).

>
> > +/*
> > + * strintmap:
> > + *    A map of string -> int, typecasting the void* of strmap to an int.
>
> Are the size and signedness of an int flexible enough for all uses?

If some users want signed values and others want unsigned, I'm not
sure how we can satisfy both.  Maybe make a struintmap?

Perhaps that could be added later if uses come up for it?  Some of my
uses need int, the rest of them wouldn't care about int vs unsigned.

> I doubt the standard makes any promises about the relationship between
> intptr_t and int, but I'd be surprised if any modern platform has an
> intptr_t that isn't at least as big as an int (on most 32-bit platforms
> they'll be the same, and on 64-bit ones intptr_t is strictly bigger).
>
> Would any callers care about using the full 32-bits, though? I.e., would
> they prefer casting through uintptr_t to an "unsigned int"?

I don't care about the full 32 bits (I'll probably use less than 16),
but I absolutely wanted it signed for my uses.  I think it makes sense
to be signed when using it for an index within an array (-1 for "not
found" makes sense; using arbitrary large numbers seems really ugly
(and perhaps buggy) to me).  It also makes sense to me to use -1 as an
invalid enum value, though I guess I could technically specify an
additional "INVALID_VALUE" within the enum and use it as the default.

If someone does care about the full range of bits up to 64 on relevant
platforms, I guess I should make it strintptr_t_map.  But besides the
egregiously ugly name, one advantage of int over intptr_t (or unsigned
over uintptr_t) is that you can use it in a printf easily:
   printf("Size: %d\n", strintmap_get(&foo, 0));
whereas if it strintmap_get() returns an intptr_t, then it's a royal
mess to attempt to portably use it without adding additional manual
casts.  Maybe I was just missing something obvious, but I couldn't
figure out the %d, %ld, %lld, PRIdMAX, etc. choices and get the
statement to compile on all platforms, so I'd always just cast to int
or unsigned at the time of calling printf.