On 6/9/22 11:49 AM, Alexei Starovoitov wrote: > On Thu, Jun 9, 2022 at 7:27 AM Dave Marchevsky <davemarchevsky@xxxxxx> wrote: >>>> + >>>> + if (use_hashmap) { >>>> + idx = bpf_get_prandom_u32() % hashmap_num_keys; >>>> + bpf_map_lookup_elem(inner_map, &idx); >>> Is the hashmap populated ? >>> >> >> Nope. Do you expect this to make a difference? Will try when confirming key / >> val size above. > > Martin brought up an important point. > The map should be populated. > If the map is empty lookup_nulls_elem_raw() will select a bucket, > it will be empty and it will return NULL. > Whereas the more accurates apples to apples comparison > would be to find a task in a map, since bpf_task_storage_get(,F_CREATE); > will certainly find it. > Then if (l->hash == hash && !memcmp ... will be triggered. > When we're counting nsecs that should be noticeable. Prepopulating the hashmap before running the benchmark does indeed have a significant effect (2-3x slower): Hashmap Control =============== num keys: 10 hashmap (control) sequential get: hits throughput: 21.193 ± 0.479 M ops/s, hits latency: 47.185 ns/op, important_hits throughput: 21.193 ± 0.479 M ops/s num keys: 1000 hashmap (control) sequential get: hits throughput: 13.515 ± 0.321 M ops/s, hits latency: 73.992 ns/op, important_hits throughput: 13.515 ± 0.321 M ops/s num keys: 10000 hashmap (control) sequential get: hits throughput: 6.087 ± 0.085 M ops/s, hits latency: 164.294 ns/op, important_hits throughput: 6.087 ± 0.085 M ops/s num keys: 100000 hashmap (control) sequential get: hits throughput: 3.860 ± 0.617 M ops/s, hits latency: 259.067 ns/op, important_hits throughput: 3.860 ± 0.617 M ops/s num keys: 4194304 hashmap (control) sequential get: hits throughput: 1.918 ± 0.017 M ops/s, hits latency: 521.286 ns/op, important_hits throughput: 1.918 ± 0.017 M ops/s vs empty hashmap's Hashmap Control =============== num keys: 10 hashmap (control) sequential get: hits throughput: 33.748 ± 0.700 M ops/s, hits latency: 29.631 ns/op, important_hits throughput: 33.748 ± 0.700 M ops/s num keys: 1000 hashmap (control) sequential get: hits throughput: 29.997 ± 0.953 M ops/s, hits latency: 33.337 ns/op, important_hits throughput: 29.997 ± 0.953 M ops/s num keys: 10000 hashmap (control) sequential get: hits throughput: 22.828 ± 1.114 M ops/s, hits latency: 43.805 ns/op, important_hits throughput: 22.828 ± 1.114 M ops/s num keys: 100000 hashmap (control) sequential get: hits throughput: 17.595 ± 0.225 M ops/s, hits latency: 56.834 ns/op, important_hits throughput: 17.595 ± 0.225 M ops/s num keys: 4194304 hashmap (control) sequential get: hits throughput: 7.098 ± 0.757 M ops/s, hits latency: 140.878 ns/op, important_hits throughput: 7.098 ± 0.757 M ops/s Bumping key size to u64 + 64 chars (72 byte total), without prepopulating the hashmap, results in significant increase as well: Hashmap Control =============== num keys: 10 hashmap (control) sequential get: hits throughput: 16.613 ± 0.693 M ops/s, hits latency: 60.193 ns/op, important_hits throughput: 16.613 ± 0.693 M ops/s num keys: 1000 hashmap (control) sequential get: hits throughput: 17.053 ± 0.137 M ops/s, hits latency: 58.640 ns/op, important_hits throughput: 17.053 ± 0.137 M ops/s num keys: 10000 hashmap (control) sequential get: hits throughput: 15.088 ± 0.131 M ops/s, hits latency: 66.276 ns/op, important_hits throughput: 15.088 ± 0.131 M ops/s num keys: 100000 hashmap (control) sequential get: hits throughput: 12.357 ± 0.050 M ops/s, hits latency: 80.928 ns/op, important_hits throughput: 12.357 ± 0.050 M ops/s num keys: 4194304 hashmap (control) sequential get: hits throughput: 5.627 ± 0.266 M ops/s, hits latency: 177.725 ns/op, important_hits throughput: 5.627 ± 0.266 M ops/s Whereas bumping value size w/o prepopulating results in no significant change from baseline. I will send a v6 with prepopulated hashmap.