On 05/22, Andrii Nakryiko wrote: > On Wed, May 22, 2019 at 1:30 PM Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote: > > > > On 05/22, Andrii Nakryiko wrote: > > > There is a need for fast point lookups inside libbpf for multiple use > > > cases (e.g., name resolution for BTF-to-C conversion, by-name lookups in > > > BTF for upcoming BPF CO-RE relocation support, etc). This patch > > > implements simple resizable non-thread safe hashmap using single linked > > > list chains. > > Didn't really look into the details, but any reason you're not using > > linux/hashtable.h? It's exported in tools/include and I think perf > > is using it. It's probably not resizable, but should be easy to > > implement rebalancing on top of it. > > There are multiple reasons. > 1. linux/hashtable.h is pretty bare-bones, it's just hlist_node and a > bunch of macro to manipulate array or chains of them. I wanted to have > higher-level API with lookup by key, insertion w/ various strategies, > etc. Preferrably one not requiring to manipulate hlist_node directly > as part of its API, even if at some performance cost of hiding that > low-level detail. > 2. Resizing is a big chunk of resizable hashmap logic, so I'd need to > write a bunch of additional code anyway. > 3. Licensing. linux/hashtable.h is under GPL, while libbpf is > dual-licensed under GPL and BSD. When syncing libbpf from kernel to > github, we have to re-implement all the parts from kernel that are not > under BSD license anyway. > 4. hlist_node keeps two pointers per item, which is unnecessary for > hashmap which does deletion by key (by searching for node first, then > deleting), so we can also have lower memory overhead per entry. > > So in general, I feel like there is little benefit to reusing > linux/hashlist.h for use cases I'm targeting this hashmap for. Makes sense. Licensing is probably the biggest issue here because my original suggestion was to use linux/hashtable.h internally, just wrap it in a nice api. But agree on all points, thanks for clarification! > > > Four different insert strategies are supported: > > > - HASHMAP_ADD - only add key/value if key doesn't exist yet; > > > - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise, > > > update value; > > > - HASHMAP_UPDATE - update value, if key already exists; otherwise, do > > > nothing and return -ENOENT; > > > - HASHMAP_APPEND - always add key/value pair, even if key already exists. > > > This turns hashmap into a multimap by allowing multiple values to be > > > associated with the same key. Most useful read API for such hashmap is > > > hashmap__for_each_key_entry() iteration. If hashmap__find() is still > > > used, it will return last inserted key/value entry (first in a bucket > > > chain). > > > > > > For HASHMAP_SET and HASHMAP_UPDATE, old key/value pair is returned, so > > > that calling code can handle proper memory management, if necessary. > > > > > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> > > > --- > > > tools/lib/bpf/Build | 2 +- > > > tools/lib/bpf/hashmap.c | 229 ++++++++++++++++++++++++++++++++++++++++ > > > tools/lib/bpf/hashmap.h | 173 ++++++++++++++++++++++++++++++ > > > 3 files changed, 403 insertions(+), 1 deletion(-) > > > create mode 100644 tools/lib/bpf/hashmap.c > > > create mode 100644 tools/lib/bpf/hashmap.h > > > > > >