Re: [PATCH v6 bpf-next] selftests/bpf: Add benchmark for local_storage get

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dave Marchevsky wrote:
> On 6/22/22 9:26 PM, John Fastabend wrote:   
> > Martin KaFai Lau wrote:
> >> On Tue, Jun 21, 2022 at 10:49:46PM -0700, John Fastabend wrote:
> >>> Martin KaFai Lau wrote:
> >>>> On Tue, Jun 21, 2022 at 12:17:54PM -0700, John Fastabend wrote:
> >>>>>> Hashmap Control
> >>>>>> ===============
> >>>>>>         num keys: 10
> >>>>>> hashmap (control) sequential    get:  hits throughput: 20.900 ± 0.334 M ops/s, hits latency: 47.847 ns/op, important_hits throughput: 20.900 ± 0.334 M ops/s
> >>>>>>
> >>>>>>         num keys: 1000
> >>>>>> hashmap (control) sequential    get:  hits throughput: 13.758 ± 0.219 M ops/s, hits latency: 72.683 ns/op, important_hits throughput: 13.758 ± 0.219 M ops/s
> >>>>>>
> >>>>>>         num keys: 10000
> >>>>>> hashmap (control) sequential    get:  hits throughput: 6.995 ± 0.034 M ops/s, hits latency: 142.959 ns/op, important_hits throughput: 6.995 ± 0.034 M ops/s
> >>>>>>
> >>>>>>         num keys: 100000
> >>>>>> hashmap (control) sequential    get:  hits throughput: 4.452 ± 0.371 M ops/s, hits latency: 224.635 ns/op, important_hits throughput: 4.452 ± 0.371 M ops/s
> >>>>>>
> >>>>>>         num keys: 4194304
> >>>>>> hashmap (control) sequential    get:  hits throughput: 3.043 ± 0.033 M ops/s, hits latency: 328.587 ns/op, important_hits throughput: 3.043 ± 0.033 M ops/s
> >>>>>>
> >>>>>
> >>>>> Why is the hashmap lookup not constant with the number of keys? It looks
> >>>>> like its prepopulated without collisions so I wouldn't expect any
> >>>>> extra ops on the lookup side after looking at the code quickly.
> >>>> It may be due to the cpu-cache misses as the map grows.
> >>>
> >>> Maybe but, values are just ints so even 1k * 4B = 4kB should be
> >>> inside an otherwise unused server class system. Would be more
> >>> believable (to me at least) if the drop off happened at 100k or
> >>> more.
> >> It is not only value (and key) size.  There is overhead.
> >> htab_elem alone is 48bytes.  key and value need to 8bytes align also.
> >>
> > 
> > Right late night math didn't add up. Now I'm wondering if we can make
> > hashmap behave much better, that drop off is looking really ugly.
> > 
> >> From a random machine:
> >> lscpu -C
> >> NAME ONE-SIZE ALL-SIZE WAYS TYPE        LEVEL  SETS PHY-LINE COHERENCY-SIZE
> >> L1d       32K     576K    8 Data            1    64        1             64
> >> L1i       32K     576K    8 Instruction     1    64        1             64
> >> L2         1M      18M   16 Unified         2  1024        1             64
> >> L3      24.8M    24.8M   11 Unified         3 36864        1             64
> > 
> > Could you do a couple more data point then, num keys=100,200,400? I would
> > expect those to fit in the cache and be same as 10 by the cache theory. I
> > could try as well but looking like Friday before I have a spare moment.
> > 
> > Thanks,
> > John
> 
> Here's a benchmark run with those num_keys.
> 
> Hashmap Control
> ===============
>         num keys: 10
> hashmap (control) sequential    get:  hits throughput: 23.072 ± 0.208 M ops/s, hits latency: 43.343 ns/op, important_hits throughput: 23.072 ± 0.208 M ops/s
> 
>         num keys: 100
> hashmap (control) sequential    get:  hits throughput: 17.967 ± 0.236 M ops/s, hits latency: 55.659 ns/op, important_hits throughput: 17.967 ± 0.236 M ops/s
> 

Hmmm this is interesting. I expected a flat line and then a hard drop off
which to me would indicate the cache miss being the culprit. But this is
almost a linear perf dec over num_keys. Guess we need to debug more.

>         num keys: 200
> hashmap (control) sequential    get:  hits throughput: 17.812 ± 0.428 M ops/s, hits latency: 56.143 ns/op, important_hits throughput: 17.812 ± 0.428 M ops/s
> 
>         num keys: 300
> hashmap (control) sequential    get:  hits throughput: 17.070 ± 0.293 M ops/s, hits latency: 58.582 ns/op, important_hits throughput: 17.070 ± 0.293 M ops/s
> 
>         num keys: 400
> hashmap (control) sequential    get:  hits throughput: 17.667 ± 0.316 M ops/s, hits latency: 56.604 ns/op, important_hits throughput: 17.667 ± 0.316 M ops/s
> 
>         num keys: 500
> hashmap (control) sequential    get:  hits throughput: 17.010 ± 0.409 M ops/s, hits latency: 58.789 ns/op, important_hits throughput: 17.010 ± 0.409 M ops/s
> 
>         num keys: 1000
> hashmap (control) sequential    get:  hits throughput: 14.330 ± 0.172 M ops/s, hits latency: 69.784 ns/op, important_hits throughput: 14.330 ± 0.172 M ops/s
> 
>         num keys: 10000
> hashmap (control) sequential    get:  hits throughput: 6.047 ± 0.024 M ops/s, hits latency: 165.380 ns/op, important_hits throughput: 6.047 ± 0.024 M ops/s
> 
>         num keys: 100000
> hashmap (control) sequential    get:  hits throughput: 4.472 ± 0.163 M ops/s, hits latency: 223.630 ns/op, important_hits throughput: 4.472 ± 0.163 M ops/s
> 
>         num keys: 4194304
> hashmap (control) sequential    get:  hits throughput: 2.785 ± 0.024 M ops/s, hits latency: 359.066 ns/op, important_hits throughput: 2.785 ± 0.024 M ops/s






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux