Hi, We are in a situation where we're frequently updating a BPF_MAP_TYPE_HASH_OF_MAPS with new data for a given key via bpf_map_update_elem(). During profiling, we've noticed that bpf_map_update_elem() on such maps is _very_ expensive. In our tests, the average time is ~9ms per call, with spikes to ~45ms per call: Function Name: bpf_map_update_elem Number of calls: 1213 Total time: 11s 880ms 994µs Maximum: 45ms 431µs Top Quartile: 11ms 660µs Average: 9ms 794µs Median: 9ms 218µs Bottom Quartile: 7ms 363µs Minimum: 23µs The cause of this poor performance is the wait for the RCU grace period when map_update_elem() is called: after the update has completed without errors, it calls maybe_wait_bpf_programs() which in turn calls synchronize_rcu() for BPF_MAP_TYPE_HASH_OF_MAPS (and BPF_MAP_TYPE_ARRAY_OF_MAPS). As I understand from the commit that introduced this [1], the RCU GP wait was added to ensure that user space could be guaranteed that after the update, no BPF programs are still looking at the old value of the map [2]. When this commit was introduced, the RCU GP wait also covered a potential UAF when updating the outer map while a BPF program was still looking at the old inner map. That UAF was (much) later addressed by a different patchset [3] and the discussion in that patchset [4] mentions that maybe_wait_bpf_programs() is not needed anymore with the UAF fixes: > So, you're correct, maybe_wait_bpf_programs() is not sufficient any more, > but we cannot delete it, since it addresses user space assumptions > on what bpf progs see when the inner map is replaced. Given this, while it's not possible to remove the wait entirely without breaking user space, I was wondering if it would be possible/acceptable to add a way to opt-out of this behavior for programs like ours that don't care about this. One way to do so could be to add an additional flag to the BPF_MAP_CREATE flags, perhaps something like BPF_F_INNER_MAP_NO_SYNC. There are already map-specific flags in there (for example, BPF_F_NO_COMMON_LRU or BPF_F_STACK_BUILD_ID), so it would fit with that pattern; maybe_wait_bpf_programs() could then check the map flags and only perform the wait if the flag is not set (which is the default). In our case, we don't care if running BPF programs are still working with the old map, but for the thousands of bpf_map_update_elem() calls we're doing in certain situations, we're spending _seconds_ waiting on the RCU GP, so adding something like this would greatly improve the latency in our scenarios. If this sounds like something that would be acceptable, I'd be happy to make the change and send a patch, of course. Any thoughts on this are appreciated! [1] commit 1ae80cf31938 ("bpf: wait for running BPF programs when updating map-in-map") [2] https://lore.kernel.org/lkml/20181111221706.032923266@xxxxxxxxxxxxxxxxxxx/ [3] https://lore.kernel.org/bpf/20231113123324.3914612-1-houtao@xxxxxxxxxxxxxxx/ [4] https://lore.kernel.org/bpf/CAADnVQK=tJRhQY1zfLK2n7_tPA5+vN8+KqWmSLqjubUuh6UFAw@xxxxxxxxxxxxxx/ Cheers, Ritesh