Re: [PATCH bpf-next 1/2] bpf: add new map ops ->map_pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 1, 2023 at 11:24 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Jun 1, 2023 at 11:17 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> > >
> > > LRU logic doesn't kick in until the map is full.
> >
> > In fact, it can: a reproducable example is in the self-test from this patch
> > series. In the test N threads try to insert random values for keys 1..3000
> > simultaneously. As the result, the map may contain any number of elements,
> > typically 100 to 1000 (never full 3000, which is also less than the map size).
> > So a user can't really even closely estimate the number of elements in the LRU
> > map based on the number of updates (with unique keys). A per-cpu counter
> > inc/dec'ed from the kernel side would solve this.
>
> That's odd and unexpected.
> Definitely something to investigate and fix in the LRU map.
>
> Pls cc Martin in the future.
>
> > > If your LRU map is not full you shouldn't be using LRU in the first place.
> >
> > This makes sense, yes, especially that LRU evictions may happen randomly,
> > without a map being full. I will step back with this patch until we investigate
> > if we can replace LRUs with hashes.
> >
> > Thanks for the comments!

Thinking about it more...
since you're proposing to use percpu counter unconditionally for prealloc
and percpu_counter_add_batch() logic is batched,
it could actually be acceptable if it's paired with non-api access.
Like another patch can add generic kfunc to do __percpu_counter_sum()
and in the 3rd patch kernel/bpf/preload/iterators/iterators.bpf.c
for maps can be extended to print the element count, so the user can have
convenient 'cat /sys/fs/bpf/maps.debug' way to debug maps.

But additional logic of percpu_counter_add_batch() might get in the way
of debugging eventually.
If we want to have stats then we can have normal per-cpu u32 in basic
struct bpf_map that most maps, except array, will inc/dec on update/delete.
kfunc to iterate over percpu is still necessary.
This way we will be able to see not only number of elements, but detect
bad usage when one cpu is only adding and another cpu is deleting elements.
And other cpu misbalance.

but debugging and stats is a slippery slope. These simple stats won't be
enough and people will be tempted to add more and more.
So I agree that there is a need for bpf map observability,
but it is not clear whether hard coded stats is the solution.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux