Re: [PATCH] hashmap: hashmap_get_next passes through keydata as well

Jeff King <peff@xxxxxxxx> · Tue, 16 May 2017 12:13:15 -0400

On Sat, May 13, 2017 at 07:06:35AM -0700, Stefan Beller wrote:

> > I think I figured it out, but I have a feeling it is violating the
> > intent of the "keydata" parameter.  That parameter is typically used not
> > as a pointer to arbitrary auxiliary data, but as a trick for finding a
> > hash entry without having to allocate a struct for it.
> 
> Yes, I was violating the intent exactly as you describe. I'll adapt my patches
> accordingly.
> 
> I do not really buy into the trick though, because when the overhead of
> allocating a 'key' struct filling in the key parts only leaving out the value
> is so much more expensive than giving the key via this keydata argument,
> then there are serious issues with your data structure IMHO.
> Example:
> [...]

Sure, in your example it works to allocate a partial structure on the
stack. But if you look at the users of the hashmap, quite a few are
structs with final flex-array members, which cannot easily do that
without allocating a new struct on the heap.  You can work around it by
converting them to flex-ptrs (at the cost of an extra 8 bytes per
struct) and having the "key only" version be an oddball by pointing
outside the struct. But I do think having the flexibility in hashmap is
nice.

I actually wish that it did not even require the hashmap entry to be at
the beginning of the struct; it makes it hard to put the same structure
into multiple hashes/lists. See for example the pack MRU, which is in
both a hash and a doubly-linked list. Fortunately the list code is
flexible enough to allow its pointers anywhere in the struct.

So yeah, we could design all of our data structures to fit into the
hashmap's world-view. But I think it's handy for it to be flexible.

> > It works out in the current code because the chaining is purely linear,
> > and doesn't care about order. So we can rehash and just stick the
> > elements into a new list. But if it were switched out for a different
> > data structure (e.g., a tree), then the hashmap code would need to be
> > able to compare elements.
> 
> Note that most compare functions do not return an order, but only
> a boolean [no]match, so putting it into an ordered tree could only
> rely on the hash that we already know without aid from the compare function.
> Of course we could fix our compare functions before doing such a
> refactoring, but I want to point out how involved that would be.

Good point, I didn't think of that. Moving to any non-linear lookup
within a single bucket would require totally changing the cmp_fn
contract, and that's not likely to happen (and anyway, if we're starting
to care about intra-bucket lookup times, that's probably a sign we
should be growing the table more aggressively).

So my concern was just nonsense.

> > I don't think we have any particular plans for such a change, but I
> > wonder if we should avoid encouraging callers to rely on the current
> > implementation.
> 
> After a night of sleep it is easy to fix my code to behave as the API
> intended. Yesterday I could not see how to fix it.

Yay, then. :)

-Peff