On Sat, Feb 8, 2025 at 4:58 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, Feb 5, 2025 at 4:58 AM Ritesh Oedayrajsingh Varma > <ritesh@xxxxxxxxxxxxxxx> wrote: > > > > Given this, while it's not possible to remove the wait entirely > > without breaking user space, I was wondering if it would be > > possible/acceptable to add a way to opt-out of this behavior for > > programs like ours that don't care about this. One way to do so could > > be to add an additional flag to the BPF_MAP_CREATE flags, perhaps > > something like BPF_F_INNER_MAP_NO_SYNC. > > Sounds reasonable. > The flag name is a bit cryptic, BPF_F_UPDATE_INNER_MAP_NO_WAIT > is a bit more explicit, but I'm not sure whether it's better. I agree the name is a bit cryptic. A related question is whether the optimization to skip the RCU wait should only apply to the update, or also to delete. I think it would make sense for it to apply to all operations. What do you think? I also realized the flag should technically apply to the *outer* map, since that's the map that's actually being modified and synced on, not the inner map. So I don't think "inner" should be part of the name in retrospect. Perhaps BPF_F_MAP_OF_MAPS_NO_WAIT or BPF_F_MAP_IN_MAP_NO_WAIT? I'm slightly leaning towards the latter because the map of maps support code is also located in map_in_map.c, so that matches nicely. They're both a bit long though. Either way, the definition of the outer map when using this flag would become something like: struct { __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS); __uint(max_entries, 4096); __type(key, u64); __type(value, u32); __uint(map_flags, BPF_F_MAP_IN_MAP_NO_WAIT); } mapInMap SEC(".maps"); > Also have you considered a batched update? There will be > only one sync_rcu() for the whole batch. We have, yes, but in our case, these updates are a result of another system generating events, and it is a bit hard to batch them together: it would involve waiting for multiple events to arrive, instead of processing events as they come in, which introduces an additional layer of latency by itself. Regarding batched update, we've found that it is also very slow when inserting a large number of elements. In one example where we inserted ~1.2 million 16-byte elements, we found it took 4-500 milliseconds to update the map via batched update. This is due to the overhead of generic_map_update_batch() individually copying each element to be inserted from user space via copy_from_user(); almost all time is going to __check_object_size() called by copy_from_user(). I suspect this one is hard to fix though, due to how the elements in a map are laid out in memory; it would be hard to change such batched updates into a single copy. But perhaps that's something for a different thread (and we can easily work around it on our side). Thanks, Ritesh