Re: [PATCH bpf-next 0/6] New benchmark for hashmap lookups

Anton Protopopov <aspsk@xxxxxxxxxxxxx> · Tue, 31 Jan 2023 11:47:28 +0100

On 23/01/30 04:17, Andrii Nakryiko wrote:
> On Fri, Jan 27, 2023 at 10:14 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> >
> > Add a new benchmark for hashmap lookups and fix several typos. See individual
> > commits for descriptions.
> >
> > One thing to mention here is that in commit 3 I've patched bench so that now
> > command line options can be reused by different benchmarks.
> >
> > The benchmark itself is added in the last commit 6. I am using this benchmark
> > to test map lookup productivity when using a different hash function (see
> > https://fosdem.org/2023/schedule/event/bpf_hashing/). The results provided by
> > the benchmark look reasonable and match the results of my different benchmarks
> > (requiring to patch kernel to get actual statistics on map lookups).
> 
> Could you share the results with us? Curious which hash functions did
> you try and which one are the most promising :)

For the longer version with pictures see the talk I've referenced above (it's
at FOSDEM next Sunday Feb 5). A short version follows.

The xxh3 hash works fine for big keys, where "big" is different for different
architectures and for different maps sizes. On my Intel i7 machine this means
key size >= 8. On my AMD machine xxh3 works better for all keys for small maps,
but degrades for keys of size 12,16,20 for bigger maps (>=200K elements or so).
Example (map size 100K, 50% full, measuring M ops/second):

    hash_size    4    8   12   16   20   24   28   32   36   40   44   48   52   56   60   64
    orig_map  15.7 15.4 14.2 13.9 13.1 13.2 12.0 12.0 11.5 11.2 10.6 10.7 10.0 10.0  9.6  9.3
    new_map   15.5 15.9 15.2 15.3 14.3 14.6 14.0 14.2 13.3 13.6 13.1 13.4 12.7 13.1 12.3 12.8

A smaller map (10K, 50% full):

    hash_size    4    8   12   16   20   24   28   32   36   40   44   48   52   56   60   64
    orig_map  36.1 36.8 32.1 32.4 29.0 29.1 26.2 26.4 23.9 24.3 21.8 22.5 20.4 20.7 19.3 19.1
    new_map   37.7 39.6 34.0 36.5 31.5 34.1 30.7 32.7 28.1 29.5 27.4 28.8 27.1 28.1 26.4 27.8

Other hash functions I've tried (older xxh32/64, spooky) work _way_ worse for
small keys than jhash/xxh3. (Answering a possible question about vector instructions:
xxh3 is scalar until key size <= 240, then the xxh64/xxh32 can be used which
also provides ~2x map lookup speed gain comparing to jhash for keys >240.)

Bloom filters for big >= ~40 keys, predictably, work way faster with xxh3 than
with jhash. (Why not similar to hashmap? Because Bloom filters use jhash2 for
keys % 4 which works faster than jhash for small keys, see also a patch below.)

The stacktrace map doesn't work much faster, because 95% of time it spends in
get_perf_callchain (the hash part, though, runs ~1.5-2.0 faster with xxh, as
hash size is typically about 60-90 bytes, so this makes sense to use xxh3 in
stacktrace by default).

One very simple change which brings 5-10% speed gain for all hashmaps is this:

static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
 {
+       if (likely((key_len & 0x3) == 0))
+               return jhash2(key, key_len >> 2, hashrnd);
        return jhash(key, key_len, hashrnd);
 }

I will follow up with a patch as simple as this ^ or with a combination of
jhash, jhash2, and xxh3 once I will run benchmarks on more architectures to
check that there are no degradations.