On Fri, Jun 02, 2023 at 09:23:11AM -0700, Alexei Starovoitov wrote: > On Fri, Jun 2, 2023 at 7:20 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote: > > > > On Thu, Jun 01, 2023 at 05:40:10PM -0700, Alexei Starovoitov wrote: > > > On Thu, Jun 1, 2023 at 11:24 AM Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > On Thu, Jun 1, 2023 at 11:17 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote: > > > > > > > > > > > > LRU logic doesn't kick in until the map is full. > > > > > > > > > > In fact, it can: a reproducable example is in the self-test from this patch > > > > > series. In the test N threads try to insert random values for keys 1..3000 > > > > > simultaneously. As the result, the map may contain any number of elements, > > > > > typically 100 to 1000 (never full 3000, which is also less than the map size). > > > > > So a user can't really even closely estimate the number of elements in the LRU > > > > > map based on the number of updates (with unique keys). A per-cpu counter > > > > > inc/dec'ed from the kernel side would solve this. > > > > > > > > That's odd and unexpected. > > > > Definitely something to investigate and fix in the LRU map. > > > > > > > > Pls cc Martin in the future. > > > > > > > > > > If your LRU map is not full you shouldn't be using LRU in the first place. > > > > > > > > > > This makes sense, yes, especially that LRU evictions may happen randomly, > > > > > without a map being full. I will step back with this patch until we investigate > > > > > if we can replace LRUs with hashes. > > > > > > > > > > Thanks for the comments! > > > > > > Thinking about it more... > > > since you're proposing to use percpu counter unconditionally for prealloc > > > and percpu_counter_add_batch() logic is batched, > > > it could actually be acceptable if it's paired with non-api access. > > > Like another patch can add generic kfunc to do __percpu_counter_sum() > > > and in the 3rd patch kernel/bpf/preload/iterators/iterators.bpf.c > > > for maps can be extended to print the element count, so the user can have > > > convenient 'cat /sys/fs/bpf/maps.debug' way to debug maps. > > > > > > But additional logic of percpu_counter_add_batch() might get in the way > > > of debugging eventually. > > > If we want to have stats then we can have normal per-cpu u32 in basic > > > struct bpf_map that most maps, except array, will inc/dec on update/delete. > > > kfunc to iterate over percpu is still necessary. > > > This way we will be able to see not only number of elements, but detect > > > bad usage when one cpu is only adding and another cpu is deleting elements. > > > And other cpu misbalance. > > > > This looks for me like two different things: one is a kfunc to get the current > > counter (e.g., bpf_map_elements_count), the other is a kfunc to dump some more > > detailed stats (e.g., per-cpu values or more). > > > > My patch, slightly modified, addresses the first goal: most maps of interest > > already have a counter in some form (sometimes just atomic_t or u64+lock). If > > we add a percpu (non-batch) counter for pre-allocated hashmaps, then it's done: > > the new kfunc can get the counter based on the map type. > > > > If/when there's need to provide per-cpu statistics of elements or some more > > sophisticated statistics, this can be done without changing the api of the > > bpf_map_elements_count() kfunc. > > > > Would this work? > > No, because bpf_map_elements_count() as a building block is too big > and too specific. Nothing else can be made out of it, but counting > elements. > "for_each_cpu in per-cpu variable" would be generic that is usable beyond > this particular use case of stats collection. Thanks. I will prepare a v2 with a "no-uapi percpu" version.