Dave Marchevsky wrote: > On 6/22/22 9:26 PM, John Fastabend wrote: > > Martin KaFai Lau wrote: > >> On Tue, Jun 21, 2022 at 10:49:46PM -0700, John Fastabend wrote: > >>> Martin KaFai Lau wrote: > >>>> On Tue, Jun 21, 2022 at 12:17:54PM -0700, John Fastabend wrote: > >>>>>> Hashmap Control > >>>>>> =============== > >>>>>> num keys: 10 > >>>>>> hashmap (control) sequential get: hits throughput: 20.900 ± 0.334 M ops/s, hits latency: 47.847 ns/op, important_hits throughput: 20.900 ± 0.334 M ops/s > >>>>>> > >>>>>> num keys: 1000 > >>>>>> hashmap (control) sequential get: hits throughput: 13.758 ± 0.219 M ops/s, hits latency: 72.683 ns/op, important_hits throughput: 13.758 ± 0.219 M ops/s > >>>>>> > >>>>>> num keys: 10000 > >>>>>> hashmap (control) sequential get: hits throughput: 6.995 ± 0.034 M ops/s, hits latency: 142.959 ns/op, important_hits throughput: 6.995 ± 0.034 M ops/s > >>>>>> > >>>>>> num keys: 100000 > >>>>>> hashmap (control) sequential get: hits throughput: 4.452 ± 0.371 M ops/s, hits latency: 224.635 ns/op, important_hits throughput: 4.452 ± 0.371 M ops/s > >>>>>> > >>>>>> num keys: 4194304 > >>>>>> hashmap (control) sequential get: hits throughput: 3.043 ± 0.033 M ops/s, hits latency: 328.587 ns/op, important_hits throughput: 3.043 ± 0.033 M ops/s > >>>>>> > >>>>> > >>>>> Why is the hashmap lookup not constant with the number of keys? It looks > >>>>> like its prepopulated without collisions so I wouldn't expect any > >>>>> extra ops on the lookup side after looking at the code quickly. > >>>> It may be due to the cpu-cache misses as the map grows. > >>> > >>> Maybe but, values are just ints so even 1k * 4B = 4kB should be > >>> inside an otherwise unused server class system. Would be more > >>> believable (to me at least) if the drop off happened at 100k or > >>> more. > >> It is not only value (and key) size. There is overhead. > >> htab_elem alone is 48bytes. key and value need to 8bytes align also. > >> > > > > Right late night math didn't add up. Now I'm wondering if we can make > > hashmap behave much better, that drop off is looking really ugly. > > > >> From a random machine: > >> lscpu -C > >> NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL SETS PHY-LINE COHERENCY-SIZE > >> L1d 32K 576K 8 Data 1 64 1 64 > >> L1i 32K 576K 8 Instruction 1 64 1 64 > >> L2 1M 18M 16 Unified 2 1024 1 64 > >> L3 24.8M 24.8M 11 Unified 3 36864 1 64 > > > > Could you do a couple more data point then, num keys=100,200,400? I would > > expect those to fit in the cache and be same as 10 by the cache theory. I > > could try as well but looking like Friday before I have a spare moment. > > > > Thanks, > > John > > Here's a benchmark run with those num_keys. > > Hashmap Control > =============== > num keys: 10 > hashmap (control) sequential get: hits throughput: 23.072 ± 0.208 M ops/s, hits latency: 43.343 ns/op, important_hits throughput: 23.072 ± 0.208 M ops/s > > num keys: 100 > hashmap (control) sequential get: hits throughput: 17.967 ± 0.236 M ops/s, hits latency: 55.659 ns/op, important_hits throughput: 17.967 ± 0.236 M ops/s > Hmmm this is interesting. I expected a flat line and then a hard drop off which to me would indicate the cache miss being the culprit. But this is almost a linear perf dec over num_keys. Guess we need to debug more. > num keys: 200 > hashmap (control) sequential get: hits throughput: 17.812 ± 0.428 M ops/s, hits latency: 56.143 ns/op, important_hits throughput: 17.812 ± 0.428 M ops/s > > num keys: 300 > hashmap (control) sequential get: hits throughput: 17.070 ± 0.293 M ops/s, hits latency: 58.582 ns/op, important_hits throughput: 17.070 ± 0.293 M ops/s > > num keys: 400 > hashmap (control) sequential get: hits throughput: 17.667 ± 0.316 M ops/s, hits latency: 56.604 ns/op, important_hits throughput: 17.667 ± 0.316 M ops/s > > num keys: 500 > hashmap (control) sequential get: hits throughput: 17.010 ± 0.409 M ops/s, hits latency: 58.789 ns/op, important_hits throughput: 17.010 ± 0.409 M ops/s > > num keys: 1000 > hashmap (control) sequential get: hits throughput: 14.330 ± 0.172 M ops/s, hits latency: 69.784 ns/op, important_hits throughput: 14.330 ± 0.172 M ops/s > > num keys: 10000 > hashmap (control) sequential get: hits throughput: 6.047 ± 0.024 M ops/s, hits latency: 165.380 ns/op, important_hits throughput: 6.047 ± 0.024 M ops/s > > num keys: 100000 > hashmap (control) sequential get: hits throughput: 4.472 ± 0.163 M ops/s, hits latency: 223.630 ns/op, important_hits throughput: 4.472 ± 0.163 M ops/s > > num keys: 4194304 > hashmap (control) sequential get: hits throughput: 2.785 ± 0.024 M ops/s, hits latency: 359.066 ns/op, important_hits throughput: 2.785 ± 0.024 M ops/s