Re: [PATCH v5 bpf-next] selftests/bpf: Add benchmark for local_storage get

Dave Marchevsky <davemarchevsky@xxxxxx> · Mon, 20 Jun 2022 15:49:30 -0400

On 6/9/22 11:49 AM, Alexei Starovoitov wrote:   
> On Thu, Jun 9, 2022 at 7:27 AM Dave Marchevsky <davemarchevsky@xxxxxx> wrote:
>>>> +
>>>> +    if (use_hashmap) {
>>>> +            idx = bpf_get_prandom_u32() % hashmap_num_keys;
>>>> +            bpf_map_lookup_elem(inner_map, &idx);
>>> Is the hashmap populated ?
>>>
>>
>> Nope. Do you expect this to make a difference? Will try when confirming key /
>> val size above.
> 
> Martin brought up an important point.
> The map should be populated.
> If the map is empty lookup_nulls_elem_raw() will select a bucket,
> it will be empty and it will return NULL.
> Whereas the more accurates apples to apples comparison
> would be to find a task in a map, since bpf_task_storage_get(,F_CREATE);
> will certainly find it.
> Then if (l->hash == hash && !memcmp ... will be triggered.
> When we're counting nsecs that should be noticeable.

Prepopulating the hashmap before running the benchmark does indeed have a
significant effect (2-3x slower):

Hashmap Control
===============
        num keys: 10
hashmap (control) sequential    get:  hits throughput: 21.193 ± 0.479 M ops/s, hits latency: 47.185 ns/op, important_hits throughput: 21.193 ± 0.479 M ops/s

        num keys: 1000
hashmap (control) sequential    get:  hits throughput: 13.515 ± 0.321 M ops/s, hits latency: 73.992 ns/op, important_hits throughput: 13.515 ± 0.321 M ops/s

        num keys: 10000
hashmap (control) sequential    get:  hits throughput: 6.087 ± 0.085 M ops/s, hits latency: 164.294 ns/op, important_hits throughput: 6.087 ± 0.085 M ops/s

        num keys: 100000
hashmap (control) sequential    get:  hits throughput: 3.860 ± 0.617 M ops/s, hits latency: 259.067 ns/op, important_hits throughput: 3.860 ± 0.617 M ops/s

        num keys: 4194304
hashmap (control) sequential    get:  hits throughput: 1.918 ± 0.017 M ops/s, hits latency: 521.286 ns/op, important_hits throughput: 1.918 ± 0.017 M ops/s

vs empty hashmap's

Hashmap Control
===============
        num keys: 10
hashmap (control) sequential    get:  hits throughput: 33.748 ± 0.700 M ops/s, hits latency: 29.631 ns/op, important_hits throughput: 33.748 ± 0.700 M ops/s

        num keys: 1000
hashmap (control) sequential    get:  hits throughput: 29.997 ± 0.953 M ops/s, hits latency: 33.337 ns/op, important_hits throughput: 29.997 ± 0.953 M ops/s

        num keys: 10000
hashmap (control) sequential    get:  hits throughput: 22.828 ± 1.114 M ops/s, hits latency: 43.805 ns/op, important_hits throughput: 22.828 ± 1.114 M ops/s

        num keys: 100000
hashmap (control) sequential    get:  hits throughput: 17.595 ± 0.225 M ops/s, hits latency: 56.834 ns/op, important_hits throughput: 17.595 ± 0.225 M ops/s

        num keys: 4194304
hashmap (control) sequential    get:  hits throughput: 7.098 ± 0.757 M ops/s, hits latency: 140.878 ns/op, important_hits throughput: 7.098 ± 0.757 M ops/s

Bumping key size to u64 + 64 chars (72 byte total), without prepopulating the
hashmap, results in significant increase as well:

Hashmap Control
===============
        num keys: 10
hashmap (control) sequential    get:  hits throughput: 16.613 ± 0.693 M ops/s, hits latency: 60.193 ns/op, important_hits throughput: 16.613 ± 0.693 M ops/s

        num keys: 1000
hashmap (control) sequential    get:  hits throughput: 17.053 ± 0.137 M ops/s, hits latency: 58.640 ns/op, important_hits throughput: 17.053 ± 0.137 M ops/s

        num keys: 10000
hashmap (control) sequential    get:  hits throughput: 15.088 ± 0.131 M ops/s, hits latency: 66.276 ns/op, important_hits throughput: 15.088 ± 0.131 M ops/s

        num keys: 100000
hashmap (control) sequential    get:  hits throughput: 12.357 ± 0.050 M ops/s, hits latency: 80.928 ns/op, important_hits throughput: 12.357 ± 0.050 M ops/s

        num keys: 4194304
hashmap (control) sequential    get:  hits throughput: 5.627 ± 0.266 M ops/s, hits latency: 177.725 ns/op, important_hits throughput: 5.627 ± 0.266 M ops/s

Whereas bumping value size w/o prepopulating results in no significant
change from baseline.

I will send a v6 with prepopulated hashmap.