On Wed, Jun 22, 2022 at 04:00:04PM -0700, Yosry Ahmed wrote: > Thanks for adding these benchmarks! > > > On Sat, Jun 4, 2022 at 3:20 PM Dave Marchevsky <davemarchevsky@xxxxxx> wrote: > > > > Add a benchmarks to demonstrate the performance cliff for local_storage > > get as the number of local_storage maps increases beyond current > > local_storage implementation's cache size. > > > > "sequential get" and "interleaved get" benchmarks are added, both of > > which do many bpf_task_storage_get calls on sets of task local_storage > > maps of various counts, while considering a single specific map to be > > 'important' and counting task_storage_gets to the important map > > separately in addition to normal 'hits' count of all gets. Goal here is > > to mimic scenario where a particular program using one map - the > > important one - is running on a system where many other local_storage > > maps exist and are accessed often. > > > > While "sequential get" benchmark does bpf_task_storage_get for map 0, 1, > > ..., {9, 99, 999} in order, "interleaved" benchmark interleaves 4 > > bpf_task_storage_gets for the important map for every 10 map gets. This > > is meant to highlight performance differences when important map is > > accessed far more frequently than non-important maps. > > > > A "hashmap control" benchmark is also included for easy comparison of > > standard bpf hashmap lookup vs local_storage get. The benchmark is > > similar to "sequential get", but creates and uses BPF_MAP_TYPE_HASH > > instead of local storage. Only one inner map is created - a hashmap > > meant to hold tid -> data mapping for all tasks. Size of the hashmap is > > hardcoded to my system's PID_MAX_LIMIT (4,194,304). The number of these > > keys which are actually fetched as part of the benchmark is > > configurable. > > > > Addition of this benchmark is inspired by conversation with Alexei in a > > previous patchset's thread [0], which highlighted the need for such a > > benchmark to motivate and validate improvements to local_storage > > implementation. My approach in that series focused on improving > > performance for explicitly-marked 'important' maps and was rejected > > with feedback to make more generally-applicable improvements while > > avoiding explicitly marking maps as important. Thus the benchmark > > reports both general and important-map-focused metrics, so effect of > > future work on both is clear. > > The current implementation falls back to a list traversal of > bpf_local_storage->list when there is a cache miss. I wonder if this > is a place with room for optimization? Maybe a hash table or a tree > would be a more performant alternative? With a benchmark to ensure it won't regress the common use cases and guage the potential improvement, it is probably something Dave is planning to explore next if I read the last thread correctly. How many task storages is in your production use cases ?