This series adds a mechanism for maps to populate per-cpu counters of elements on insertions/deletions. The sum of these counters can be accessed by a new kfunc from a map iterator program. The following patches are present in the series: * Patch 1 adds a generic per-cpu counter to struct bpf_map * Patch 2 utilizes this mechanism for hash-based maps * Patch 3 extends the preloaded map iterator to dump the sum * Patch 4 adds a self-test for the change The reason for adding this functionality in our case (Cilium) is to get signals about how full some heavy-used maps are and what the actual dynamic profile of map capacity is. In the case of LRU maps this is impossible to get this information anyhow else. See also [1]. This is a v2 for the https://lore.kernel.org/bpf/20230531110511.64612-1-aspsk@xxxxxxxxxxxxx/T/#t It was rewritten according to previous comments. I've turned this series into an RFC for two reasons: 1) This patch only works on systems where this_cpu_{inc,dec} is atomic for s64. For systems which might write s64 non-atomically this would be required to use some locking mechanism to prevent readers from reading trash via the bpf_map_sum_elements_counter() kfunc (see patch 1) 2) In comparison with the v1, we're adding extra instructions per map operation (for preallocated maps, as well as for non-preallocated maps). The only functionality we're interested at the moment is the number of elements present in a map, not a per-cpu statistics. This could be better achieved by using the v1 version, which only adds computations for preallocated maps. So, the question is: won't it be fine to do the changes in the following way: * extend the preallocated hash maps to populate percpu batch counters as in v1 * add a kfunc as in v2 to get the current sum This works as * nobody at the moment actually requires the per-cpu statistcs * this implementation can be transparently turned into per-cpu statistics, if such a need occurs on practice (the only thing to change would be to re-implement the kfunc and, maybe, add more kfuncs to get per-cpu stats) * the "v1 way" is the least intrusive: it only affects preallocated maps, as other maps already provide the required functionality [1] https://lpc.events/event/16/contributions/1368/ v1 -> v2: - make the counters generic part of struct bpf_map - don't use map_info and /proc/self/fdinfo in favor of a kfunc Anton Protopopov (4): bpf: add percpu stats for bpf_map elements insertions/deletions bpf: populate the per-cpu insertions/deletions counters for hashmaps bpf: make preloaded map iterators to display map elements count selftests/bpf: test map percpu stats include/linux/bpf.h | 30 + kernel/bpf/hashtab.c | 102 ++-- kernel/bpf/map_iter.c | 48 +- kernel/bpf/preload/iterators/iterators.bpf.c | 9 +- .../iterators/iterators.lskel-little-endian.h | 513 +++++++++--------- .../bpf/map_tests/map_percpu_stats.c | 336 ++++++++++++ .../selftests/bpf/progs/map_percpu_stats.c | 24 + 7 files changed, 766 insertions(+), 296 deletions(-) create mode 100644 tools/testing/selftests/bpf/map_tests/map_percpu_stats.c create mode 100644 tools/testing/selftests/bpf/progs/map_percpu_stats.c -- 2.34.1