On Thu, Jun 1, 2023 at 12:30 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote: > > On Wed, May 31, 2023 at 11:24:29AM -0700, Alexei Starovoitov wrote: > > On Wed, May 31, 2023 at 11:05:10AM +0000, Anton Protopopov wrote: > > > static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l) > > > { > > > htab_put_fd_value(htab, l); > > > > > > + dec_elem_count(htab); > > > + > > > if (htab_is_prealloc(htab)) { > > > check_and_free_fields(htab, l); > > > __pcpu_freelist_push(&htab->freelist, &l->fnode); > > > } else { > > > - dec_elem_count(htab); > > > htab_elem_free(htab, l); > > > } > > > } > > > @@ -1006,6 +1024,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, > > > if (!l) > > > return ERR_PTR(-E2BIG); > > > l_new = container_of(l, struct htab_elem, fnode); > > > + inc_elem_count(htab); > > > > The current use_percpu_counter heuristic is far from perfect. It works for some cases, > > but will surely get bad as the comment next to PERCPU_COUNTER_BATCH is trying to say. > > Hence, there is a big performance risk doing inc/dec everywhere. > > Hence, this is a nack: we cannot decrease performance of various maps for few folks > > who want to see map stats. > > This patch adds some inc/dec only for preallocated hashtabs and doesn't change > code for BPF_F_NO_PREALLOC (they already do incs/decs where needed). And for > preallocated hashtabs we don't need to compare counters, exactly. that's why I don't like to add inc/dec that serves no purpose other than stats. > so a raw (non-batch) > percpu counter may be used for this case. and you can do it inside your own bpf prog. > > If you want to see "pressure", please switch cilium to use bpf_mem_alloc htab and > > use tracing style direct 'struct bpf_htab' access like progs/map_ptr_kern.c is demonstrating. > > No kernel patches needed. > > Then bpf_prog_run such tracing prog and read all internal map info. > > It's less convenient that exposing things in uapi, but not being uapi is the point. > > Thanks for the pointers, this makes sense. However, this doesn't work for LRU > which is always pre-allocated. Would it be ok if we add non-batch percpu > counter for !BPF_F_NO_PREALLOC case and won't expose it directly to userspace? LRU logic doesn't kick in until the map is full. If your LRU map is not full you shouldn't be using LRU in the first place.