Re: [PATCH bpf-next 1/2] bpf: add new map ops ->map_pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 01, 2023 at 09:39:43AM -0700, Alexei Starovoitov wrote:
> On Thu, Jun 1, 2023 at 12:30 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> >
> > On Wed, May 31, 2023 at 11:24:29AM -0700, Alexei Starovoitov wrote:
> > > On Wed, May 31, 2023 at 11:05:10AM +0000, Anton Protopopov wrote:
> > > >  static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
> > > >  {
> > > >     htab_put_fd_value(htab, l);
> > > >
> > > > +   dec_elem_count(htab);
> > > > +
> > > >     if (htab_is_prealloc(htab)) {
> > > >             check_and_free_fields(htab, l);
> > > >             __pcpu_freelist_push(&htab->freelist, &l->fnode);
> > > >     } else {
> > > > -           dec_elem_count(htab);
> > > >             htab_elem_free(htab, l);
> > > >     }
> > > >  }
> > > > @@ -1006,6 +1024,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
> > > >                     if (!l)
> > > >                             return ERR_PTR(-E2BIG);
> > > >                     l_new = container_of(l, struct htab_elem, fnode);
> > > > +                   inc_elem_count(htab);
> > >
> > > The current use_percpu_counter heuristic is far from perfect. It works for some cases,
> > > but will surely get bad as the comment next to PERCPU_COUNTER_BATCH is trying to say.
> > > Hence, there is a big performance risk doing inc/dec everywhere.
> > > Hence, this is a nack: we cannot decrease performance of various maps for few folks
> > > who want to see map stats.
> >
> > This patch adds some inc/dec only for preallocated hashtabs and doesn't change
> > code for BPF_F_NO_PREALLOC (they already do incs/decs where needed). And for
> > preallocated hashtabs we don't need to compare counters,
> 
> exactly. that's why I don't like to add inc/dec that serves no purpose
> other than stats.
> 
> > so a raw (non-batch)
> > percpu counter may be used for this case.
> 
> and you can do it inside your own bpf prog.
> 
> > > If you want to see "pressure", please switch cilium to use bpf_mem_alloc htab and
> > > use tracing style direct 'struct bpf_htab' access like progs/map_ptr_kern.c is demonstrating.
> > > No kernel patches needed.
> > > Then bpf_prog_run such tracing prog and read all internal map info.
> > > It's less convenient that exposing things in uapi, but not being uapi is the point.
> >
> > Thanks for the pointers, this makes sense. However, this doesn't work for LRU
> > which is always pre-allocated. Would it be ok if we add non-batch percpu
> > counter for !BPF_F_NO_PREALLOC case and won't expose it directly to userspace?
> 
> LRU logic doesn't kick in until the map is full.

In fact, it can: a reproducable example is in the self-test from this patch
series. In the test N threads try to insert random values for keys 1..3000
simultaneously. As the result, the map may contain any number of elements,
typically 100 to 1000 (never full 3000, which is also less than the map size).
So a user can't really even closely estimate the number of elements in the LRU
map based on the number of updates (with unique keys). A per-cpu counter
inc/dec'ed from the kernel side would solve this.

> If your LRU map is not full you shouldn't be using LRU in the first place.

This makes sense, yes, especially that LRU evictions may happen randomly,
without a map being full. I will step back with this patch until we investigate
if we can replace LRUs with hashes.

Thanks for the comments!




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux