Re: [PATCH bpf-next 1/2] bpf: add new map ops ->map_pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alexei Starovoitov wrote:
> On Fri, Jun 2, 2023 at 7:20 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Jun 01, 2023 at 05:40:10PM -0700, Alexei Starovoitov wrote:
> > > On Thu, Jun 1, 2023 at 11:24 AM Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jun 1, 2023 at 11:17 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > LRU logic doesn't kick in until the map is full.
> > > > >
> > > > > In fact, it can: a reproducable example is in the self-test from this patch
> > > > > series. In the test N threads try to insert random values for keys 1..3000
> > > > > simultaneously. As the result, the map may contain any number of elements,
> > > > > typically 100 to 1000 (never full 3000, which is also less than the map size).
> > > > > So a user can't really even closely estimate the number of elements in the LRU
> > > > > map based on the number of updates (with unique keys). A per-cpu counter
> > > > > inc/dec'ed from the kernel side would solve this.
> > > >
> > > > That's odd and unexpected.
> > > > Definitely something to investigate and fix in the LRU map.
> > > >
> > > > Pls cc Martin in the future.
> > > >
> > > > > > If your LRU map is not full you shouldn't be using LRU in the first place.
> > > > >
> > > > > This makes sense, yes, especially that LRU evictions may happen randomly,
> > > > > without a map being full. I will step back with this patch until we investigate
> > > > > if we can replace LRUs with hashes.
> > > > >
> > > > > Thanks for the comments!
> > >
> > > Thinking about it more...
> > > since you're proposing to use percpu counter unconditionally for prealloc
> > > and percpu_counter_add_batch() logic is batched,
> > > it could actually be acceptable if it's paired with non-api access.
> > > Like another patch can add generic kfunc to do __percpu_counter_sum()
> > > and in the 3rd patch kernel/bpf/preload/iterators/iterators.bpf.c
> > > for maps can be extended to print the element count, so the user can have
> > > convenient 'cat /sys/fs/bpf/maps.debug' way to debug maps.
> > >
> > > But additional logic of percpu_counter_add_batch() might get in the way
> > > of debugging eventually.
> > > If we want to have stats then we can have normal per-cpu u32 in basic
> > > struct bpf_map that most maps, except array, will inc/dec on update/delete.
> > > kfunc to iterate over percpu is still necessary.
> > > This way we will be able to see not only number of elements, but detect
> > > bad usage when one cpu is only adding and another cpu is deleting elements.
> > > And other cpu misbalance.
> >
> > This looks for me like two different things: one is a kfunc to get the current
> > counter (e.g., bpf_map_elements_count), the other is a kfunc to dump some more
> > detailed stats (e.g., per-cpu values or more).
> >
> > My patch, slightly modified, addresses the first goal: most maps of interest
> > already have a counter in some form (sometimes just atomic_t or u64+lock). If
> > we add a percpu (non-batch) counter for pre-allocated hashmaps, then it's done:
> > the new kfunc can get the counter based on the map type.
> >
> > If/when there's need to provide per-cpu statistics of elements or some more
> > sophisticated statistics, this can be done without changing the api of the
> > bpf_map_elements_count() kfunc.
> >
> > Would this work?
> 
> No, because bpf_map_elements_count() as a building block is too big
> and too specific. Nothing else can be made out of it, but counting
> elements.
> "for_each_cpu in per-cpu variable" would be generic that is usable beyond
> this particular use case of stats collection.

Without much thought, could you hook the eviction logic in LRU to know
when the evict happens and even more details about what was evicted so
we could debug the random case where we evict something in a conntrack
table and then later it comes back to life and sends some data like a
long living UDP session.

For example in the cases where you build an LRU map because in 99%
cases no evictions happen and the LRU is just there as a backstop
you might even generate events to userspace to let it know evicts
are in progress and they should do something about it.

Thanks,
John




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux