On Wed, 31 Jan 2024 at 20:52, Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: [...] > > | num_maps: 1000 > > | local_storage cache sequential get: > > | <before> | <after> > > | hits throughput: 0.357 ± 0.005 M ops/s | 0.325 ± 0.005 M ops/s (-9.0%) > > | hits latency: 2803.738 ns/op | 3076.923 ns/op (+9.7%) > > Is it understood why the slow down here? The same goes for the "num_maps: 32" > case above but not as bad as here. It turned out that there's a real slowdown due to the outlined slowpath. If I inline everything except for inserting the entry into the cache (cacheit_lockit codepath is still outlined), the results look much better even for the case where it always misses the cache. [...] > > diff --git a/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c b/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c > > index a043d8fefdac..9895087a9235 100644 > > --- a/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c > > +++ b/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c > > @@ -21,7 +21,7 @@ struct { > > __type(value, long); > > } map_b SEC(".maps"); > > > > -SEC("fentry/bpf_local_storage_lookup") > > +SEC("fentry/bpf_local_storage_lookup_slowpath") > > The selftest is trying to catch recursion. The change here cannot test the same > thing because the slowpath will never be hit in the test_progs. I don't have a > better idea for now also. Trying to prepare a v2, and for the test, the only option I see is to introduce a tracepoint ("bpf_local_storage_lookup"). If unused, should be a no-op due to static branch. Or can you suggest different functions to hook to for the recursion test? Preferences?