On Fri, Jun 23, 2023 at 2:53 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > On 6/22/23 11:53 AM, Anton Protopopov wrote: > > This series adds a mechanism for maps to populate per-cpu counters of elements > > on insertions/deletions. The sum of these counters can be accessed by a new > > kfunc from a map iterator program. > > > > The following patches are present in the series: > > > > * Patch 1 adds a generic per-cpu counter to struct bpf_map > > * Patch 2 utilizes this mechanism for hash-based maps > > * Patch 3 extends the preloaded map iterator to dump the sum > > * Patch 4 adds a self-test for the change > > > > The reason for adding this functionality in our case (Cilium) is to get > > signals about how full some heavy-used maps are and what the actual dynamic > > profile of map capacity is. In the case of LRU maps this is impossible to get > > this information anyhow else. See also [1]. > > > > This is a v2 for the https://lore.kernel.org/bpf/20230531110511.64612-1-aspsk@xxxxxxxxxxxxx/T/#t > > It was rewritten according to previous comments. I've turned this series into > > an RFC for two reasons: > > > > 1) This patch only works on systems where this_cpu_{inc,dec} is atomic for s64. > > For systems which might write s64 non-atomically this would be required to use > > some locking mechanism to prevent readers from reading trash via the > > bpf_map_sum_elements_counter() kfunc (see patch 1) > > > > 2) In comparison with the v1, we're adding extra instructions per map operation > > (for preallocated maps, as well as for non-preallocated maps). The only > > functionality we're interested at the moment is the number of elements present > > in a map, not a per-cpu statistics. This could be better achieved by using > > the v1 version, which only adds computations for preallocated maps. > > > > So, the question is: won't it be fine to do the changes in the following way: > > > > * extend the preallocated hash maps to populate percpu batch counters as in v1 > > * add a kfunc as in v2 to get the current sum > > > > This works as > > > > * nobody at the moment actually requires the per-cpu statistcs > > * this implementation can be transparently turned into per-cpu statistics, if > > such a need occurs on practice (the only thing to change would be to > > re-implement the kfunc and, maybe, add more kfuncs to get per-cpu stats) > > * the "v1 way" is the least intrusive: it only affects preallocated maps, as > > other maps already provide the required functionality > > > > [1] https://lpc.events/event/16/contributions/1368/ > > > > v1 -> v2: > > - make the counters generic part of struct bpf_map > > - don't use map_info and /proc/self/fdinfo in favor of a kfunc > > Tbh, I did like v1 approach a bit better. We are trying to bend over backwards just > so that we don't add things to uapi, but in the end we are also adding it to the > maps.debug, etc (yes, it has .debug in the name and all) ... I think we should keep bending even more backwards to avoid uapi burden. All new features should be non-uapi no matter how many people say "I'll definitely use it". > or as an extensible bpf_map_info stats extension in case there is some agreement? I'd rather not. bpf_map_info returns what user space sent to the kernel earlier. stats or anything that user space didn't explicitly give earlier is quite different. Same goes for bpf_prog_info. We only added verified_insns there that doesn't fit the above definition and it was a mistake. After almost 2 years it is still unused and cannot be removed. veristat is parsing non-uapi verifier log. Tooling can live with non-uapi map stats just fine.