Re: [PATCH bpf-next 05/12] libbpf: add resizable non-thread safe internal hashmap

Stanislav Fomichev <sdf@xxxxxxxxxxx> · Wed, 22 May 2019 15:30:13 -0700



On 05/22, Andrii Nakryiko wrote:
> On Wed, May 22, 2019 at 1:30 PM Stanislav Fomichev <sdf@xxxxxxxxxxx> wrote:
> >
> > On 05/22, Andrii Nakryiko wrote:
> > > There is a need for fast point lookups inside libbpf for multiple use
> > > cases (e.g., name resolution for BTF-to-C conversion, by-name lookups in
> > > BTF for upcoming BPF CO-RE relocation support, etc). This patch
> > > implements simple resizable non-thread safe hashmap using single linked
> > > list chains.
> > Didn't really look into the details, but any reason you're not using
> > linux/hashtable.h? It's exported in tools/include and I think perf
> > is using it. It's probably not resizable, but should be easy to
> > implement rebalancing on top of it.
> 
> There are multiple reasons.
> 1. linux/hashtable.h is pretty bare-bones, it's just hlist_node and a
> bunch of macro to manipulate array or chains of them. I wanted to have
> higher-level API with lookup by key, insertion w/ various strategies,
> etc. Preferrably one not requiring to manipulate hlist_node directly
> as part of its API, even if at some performance cost of hiding that
> low-level detail.
> 2. Resizing is a big chunk of resizable hashmap logic, so I'd need to
> write a bunch of additional code anyway.
> 3. Licensing. linux/hashtable.h is under GPL, while libbpf is
> dual-licensed under GPL and BSD. When syncing libbpf from kernel to
> github, we have to re-implement all the parts from kernel that are not
> under BSD license anyway.
> 4. hlist_node keeps two pointers per item, which is unnecessary for
> hashmap which does deletion by key (by searching for node first, then
> deleting), so we can also have lower memory overhead per entry.
> 
> So in general, I feel like there is little benefit to reusing
> linux/hashlist.h for use cases I'm targeting this hashmap for.
Makes sense. Licensing is probably the biggest issue here because
my original suggestion was to use linux/hashtable.h internally,
just wrap it in a nice api.
But agree on all points, thanks for clarification!

> > > Four different insert strategies are supported:
> > >  - HASHMAP_ADD - only add key/value if key doesn't exist yet;
> > >  - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
> > >    update value;
> > >  - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
> > >    nothing and return -ENOENT;
> > >  - HASHMAP_APPEND - always add key/value pair, even if key already exists.
> > >    This turns hashmap into a multimap by allowing multiple values to be
> > >    associated with the same key. Most useful read API for such hashmap is
> > >    hashmap__for_each_key_entry() iteration. If hashmap__find() is still
> > >    used, it will return last inserted key/value entry (first in a bucket
> > >    chain).
> > >
> > > For HASHMAP_SET and HASHMAP_UPDATE, old key/value pair is returned, so
> > > that calling code can handle proper memory management, if necessary.
> > >
> > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx>
> > > ---
> > >  tools/lib/bpf/Build     |   2 +-
> > >  tools/lib/bpf/hashmap.c | 229 ++++++++++++++++++++++++++++++++++++++++
> > >  tools/lib/bpf/hashmap.h | 173 ++++++++++++++++++++++++++++++
> > >  3 files changed, 403 insertions(+), 1 deletion(-)
> > >  create mode 100644 tools/lib/bpf/hashmap.c
> > >  create mode 100644 tools/lib/bpf/hashmap.h
> > >
> > >