On Sat, Feb 8, 2025 at 2:39 AM Ritesh Oedayrajsingh Varma <ritesh@xxxxxxxxxxxxxxx> wrote: > > On Sat, Feb 8, 2025 at 4:58 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Wed, Feb 5, 2025 at 4:58 AM Ritesh Oedayrajsingh Varma > > <ritesh@xxxxxxxxxxxxxxx> wrote: > > > > > > Given this, while it's not possible to remove the wait entirely > > > without breaking user space, I was wondering if it would be > > > possible/acceptable to add a way to opt-out of this behavior for > > > programs like ours that don't care about this. One way to do so could > > > be to add an additional flag to the BPF_MAP_CREATE flags, perhaps > > > something like BPF_F_INNER_MAP_NO_SYNC. > > > > Sounds reasonable. > > The flag name is a bit cryptic, BPF_F_UPDATE_INNER_MAP_NO_WAIT > > is a bit more explicit, but I'm not sure whether it's better. > > I agree the name is a bit cryptic. A related question is whether the > optimization to skip the RCU wait should only apply to the update, or > also to delete. I think it would make sense for it to apply to all > operations. What do you think? > > I also realized the flag should technically apply to the *outer* map, > since that's the map that's actually being modified and synced on, not > the inner map. So I don't think "inner" should be part of the name in > retrospect. BPF_F_UPDATE_INNER_MAP_NO_WAIT suppose to mean update_OF_inner_map. > Perhaps BPF_F_MAP_OF_MAPS_NO_WAIT or > BPF_F_MAP_IN_MAP_NO_WAIT? I'm slightly leaning towards the latter > because the map of maps support code is also located in map_in_map.c, > so that matches nicely. They're both a bit long though. Either way, > the definition of the outer map when using this flag would become > something like: > > struct { > __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS); > __uint(max_entries, 4096); > __type(key, u64); > __type(value, u32); > __uint(map_flags, BPF_F_MAP_IN_MAP_NO_WAIT); > } mapInMap SEC(".maps"); Actually we probably should make it similar to BPF_NOEXIST/ANY flag, so it can be passed to update/delete command instead of being a global property of the map. BPF_F_MAP_IN_MAP_NO_WAIT name sounds fine. But before we add the uapi flag please benchmark synchronize_rcu_expedited() as Hou suggested. > > Also have you considered a batched update? There will be > > only one sync_rcu() for the whole batch. > > We have, yes, but in our case, these updates are a result of another > system generating events, and it is a bit hard to batch them together: > it would involve waiting for multiple events to arrive, instead of > processing events as they come in, which introduces an additional > layer of latency by itself. > > Regarding batched update, we've found that it is also very slow when > inserting a large number of elements. In one example where we inserted > ~1.2 million 16-byte elements, we found it took 4-500 milliseconds to > update the map via batched update. This is due to the overhead of > generic_map_update_batch() individually copying each element to be > inserted from user space via copy_from_user(); almost all time is > going to __check_object_size() called by copy_from_user(). I suspect > this one is hard to fix though, due to how the elements in a map are > laid out in memory; it would be hard to change such batched updates > into a single copy. But perhaps that's something for a different > thread (and we can easily work around it on our side). Disable CONFIG_HARDENED_USERCOPY.