On Sat, May 13, 2017 at 07:06:35AM -0700, Stefan Beller wrote: > > I think I figured it out, but I have a feeling it is violating the > > intent of the "keydata" parameter. That parameter is typically used not > > as a pointer to arbitrary auxiliary data, but as a trick for finding a > > hash entry without having to allocate a struct for it. > > Yes, I was violating the intent exactly as you describe. I'll adapt my patches > accordingly. > > I do not really buy into the trick though, because when the overhead of > allocating a 'key' struct filling in the key parts only leaving out the value > is so much more expensive than giving the key via this keydata argument, > then there are serious issues with your data structure IMHO. > Example: > [...] Sure, in your example it works to allocate a partial structure on the stack. But if you look at the users of the hashmap, quite a few are structs with final flex-array members, which cannot easily do that without allocating a new struct on the heap. You can work around it by converting them to flex-ptrs (at the cost of an extra 8 bytes per struct) and having the "key only" version be an oddball by pointing outside the struct. But I do think having the flexibility in hashmap is nice. I actually wish that it did not even require the hashmap entry to be at the beginning of the struct; it makes it hard to put the same structure into multiple hashes/lists. See for example the pack MRU, which is in both a hash and a doubly-linked list. Fortunately the list code is flexible enough to allow its pointers anywhere in the struct. So yeah, we could design all of our data structures to fit into the hashmap's world-view. But I think it's handy for it to be flexible. > > It works out in the current code because the chaining is purely linear, > > and doesn't care about order. So we can rehash and just stick the > > elements into a new list. But if it were switched out for a different > > data structure (e.g., a tree), then the hashmap code would need to be > > able to compare elements. > > Note that most compare functions do not return an order, but only > a boolean [no]match, so putting it into an ordered tree could only > rely on the hash that we already know without aid from the compare function. > Of course we could fix our compare functions before doing such a > refactoring, but I want to point out how involved that would be. Good point, I didn't think of that. Moving to any non-linear lookup within a single bucket would require totally changing the cmp_fn contract, and that's not likely to happen (and anyway, if we're starting to care about intra-bucket lookup times, that's probably a sign we should be growing the table more aggressively). So my concern was just nonsense. > > I don't think we have any particular plans for such a change, but I > > wonder if we should avoid encouraging callers to rely on the current > > implementation. > > After a night of sleep it is easy to fix my code to behave as the API > intended. Yesterday I could not see how to fix it. Yay, then. :) -Peff