On Sat, Apr 1, 2023 at 1:05 PM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote: > > The BPF hashmap uses the jhash() hash function. There is an optimized version > of this hash function which may be used if hash size is a multiple of 4. Apply > this optimization to the hashmap in a similar way as it is done in the bloom > filter map. > > On practice the optimization is only noticeable for smaller key sizes, which, > however, is sufficient for many applications. An example is listed in the > following table of measurements (a hashmap of 65536 elements was used): > > -------------------------------------------------------------------- > | key_size | fullness | lookups /sec | lookups (opt) /sec | gain | > -------------------------------------------------------------------- > | 4 | 25% | 42.990M | 46.000M | 7.0% | > | 4 | 50% | 37.910M | 39.094M | 3.1% | > | 4 | 75% | 34.486M | 36.124M | 4.7% | > | 4 | 100% | 31.760M | 32.719M | 3.0% | > -------------------------------------------------------------------- > | 8 | 25% | 43.855M | 49.626M | 13.2% | > | 8 | 50% | 38.328M | 42.152M | 10.0% | > | 8 | 75% | 34.483M | 38.088M | 10.5% | > | 8 | 100% | 31.306M | 34.686M | 10.8% | > -------------------------------------------------------------------- > | 12 | 25% | 38.398M | 43.770M | 14.0% | > | 12 | 50% | 33.336M | 37.712M | 13.1% | > | 12 | 75% | 29.917M | 34.440M | 15.1% | > | 12 | 100% | 27.322M | 30.480M | 11.6% | > -------------------------------------------------------------------- > | 16 | 25% | 41.491M | 41.921M | 1.0% | > | 16 | 50% | 36.206M | 36.474M | 0.7% | > | 16 | 75% | 32.529M | 33.027M | 1.5% | > | 16 | 100% | 29.581M | 30.325M | 2.5% | > -------------------------------------------------------------------- > | 20 | 25% | 34.240M | 36.787M | 7.4% | > | 20 | 50% | 30.328M | 32.663M | 7.7% | > | 20 | 75% | 27.536M | 29.354M | 6.6% | > | 20 | 100% | 24.847M | 26.505M | 6.7% | > -------------------------------------------------------------------- > | 24 | 25% | 36.329M | 40.608M | 11.8% | > | 24 | 50% | 31.444M | 35.059M | 11.5% | > | 24 | 75% | 28.426M | 31.452M | 10.6% | > | 24 | 100% | 26.278M | 28.741M | 9.4% | > -------------------------------------------------------------------- > | 28 | 25% | 31.540M | 31.944M | 1.3% | > | 28 | 50% | 27.739M | 28.063M | 1.2% | > | 28 | 75% | 24.993M | 25.814M | 3.3% | > | 28 | 100% | 23.513M | 23.500M | -0.1% | > -------------------------------------------------------------------- > | 32 | 25% | 32.116M | 33.953M | 5.7% | > | 32 | 50% | 28.879M | 29.859M | 3.4% | > | 32 | 75% | 26.227M | 26.948M | 2.7% | > | 32 | 100% | 23.829M | 24.613M | 3.3% | > -------------------------------------------------------------------- > | 64 | 25% | 22.535M | 22.554M | 0.1% | > | 64 | 50% | 20.471M | 20.675M | 1.0% | > | 64 | 75% | 19.077M | 19.146M | 0.4% | > | 64 | 100% | 17.710M | 18.131M | 2.4% | > -------------------------------------------------------------------- > > The following script was used to gather the results (SMT & frequency off): > > cd tools/testing/selftests/bpf > for key_size in 4 8 12 16 20 24 28 32 64; do > for nr_entries in `seq 16384 16384 65536`; do > fullness=$(printf '%3s' $((nr_entries*100/65536))) > echo -n "key_size=$key_size: $fullness% full: " > sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=$key_size --nr_entries=$nr_entries --max_entries=65536 --nr_loops=2000000 --map_flags=0x40 | grep cpu > done > echo > done > > Signed-off-by: Anton Protopopov <aspsk@xxxxxxxxxxxxx> > --- > > v1->v2: > - simplify/optimize code by just testing the (key_len%4 == 0) in hot path (Alexei) > > kernel/bpf/hashtab.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > index 96b645bba3a4..00c253b84bf5 100644 > --- a/kernel/bpf/hashtab.c > +++ b/kernel/bpf/hashtab.c > @@ -607,6 +607,8 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) > > static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd) > { > + if (likely(key_len % 4 == 0)) > + return jhash2(key, key_len / 4, hashrnd); > return jhash(key, key_len, hashrnd); > } This looks much cleaner than v1. Applied. Do you mind doing similar patch for bloomfilter? (removing aligned_u32_count variable)