Re: [PATCH v5 bpf-next] selftests/bpf: Add benchmark for local_storage get

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 24, 2022 at 1:22 PM Martin KaFai Lau <kafai@xxxxxx> wrote:
>
> On Wed, Jun 22, 2022 at 04:00:04PM -0700, Yosry Ahmed wrote:
> > Thanks for adding these benchmarks!
> >
> >
> > On Sat, Jun 4, 2022 at 3:20 PM Dave Marchevsky <davemarchevsky@xxxxxx> wrote:
> > >
> > > Add a benchmarks to demonstrate the performance cliff for local_storage
> > > get as the number of local_storage maps increases beyond current
> > > local_storage implementation's cache size.
> > >
> > > "sequential get" and "interleaved get" benchmarks are added, both of
> > > which do many bpf_task_storage_get calls on sets of task local_storage
> > > maps of various counts, while considering a single specific map to be
> > > 'important' and counting task_storage_gets to the important map
> > > separately in addition to normal 'hits' count of all gets. Goal here is
> > > to mimic scenario where a particular program using one map - the
> > > important one - is running on a system where many other local_storage
> > > maps exist and are accessed often.
> > >
> > > While "sequential get" benchmark does bpf_task_storage_get for map 0, 1,
> > > ..., {9, 99, 999} in order, "interleaved" benchmark interleaves 4
> > > bpf_task_storage_gets for the important map for every 10 map gets. This
> > > is meant to highlight performance differences when important map is
> > > accessed far more frequently than non-important maps.
> > >
> > > A "hashmap control" benchmark is also included for easy comparison of
> > > standard bpf hashmap lookup vs local_storage get. The benchmark is
> > > similar to "sequential get", but creates and uses BPF_MAP_TYPE_HASH
> > > instead of local storage. Only one inner map is created - a hashmap
> > > meant to hold tid -> data mapping for all tasks. Size of the hashmap is
> > > hardcoded to my system's PID_MAX_LIMIT (4,194,304). The number of these
> > > keys which are actually fetched as part of the benchmark is
> > > configurable.
> > >
> > > Addition of this benchmark is inspired by conversation with Alexei in a
> > > previous patchset's thread [0], which highlighted the need for such a
> > > benchmark to motivate and validate improvements to local_storage
> > > implementation. My approach in that series focused on improving
> > > performance for explicitly-marked 'important' maps and was rejected
> > > with feedback to make more generally-applicable improvements while
> > > avoiding explicitly marking maps as important. Thus the benchmark
> > > reports both general and important-map-focused metrics, so effect of
> > > future work on both is clear.
> >
> > The current implementation falls back to a list traversal of
> > bpf_local_storage->list when there is a cache miss. I wonder if this
> > is a place with room for optimization? Maybe a hash table or a tree
> > would be a more performant alternative?
> With a benchmark to ensure it won't regress the common use cases
> and guage the potential improvement,  it is probably something Dave
> is planning to explore next if I read the last thread correctly.
>
> How many task storages is in your production use cases ?

Unfortunately, I don't have this information. I was just making a
suggestion based on code inspection, not based on data from
production.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux