Re: Poor performance of bpf_map_update_elem() for BPF_MAP_TYPE_HASH_OF_MAPS / BPF_MAP_TYPE_ARRAY_OF_MAPS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 8, 2025 at 2:39 AM Ritesh Oedayrajsingh Varma
<ritesh@xxxxxxxxxxxxxxx> wrote:
>
> On Sat, Feb 8, 2025 at 4:58 AM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Wed, Feb 5, 2025 at 4:58 AM Ritesh Oedayrajsingh Varma
> > <ritesh@xxxxxxxxxxxxxxx> wrote:
> > >
> > > Given this, while it's not possible to remove the wait entirely
> > > without breaking user space, I was wondering if it would be
> > > possible/acceptable to add a way to opt-out of this behavior for
> > > programs like ours that don't care about this. One way to do so could
> > > be to add an additional flag to the BPF_MAP_CREATE flags, perhaps
> > > something like BPF_F_INNER_MAP_NO_SYNC.
> >
> > Sounds reasonable.
> > The flag name is a bit cryptic, BPF_F_UPDATE_INNER_MAP_NO_WAIT
> > is a bit more explicit, but I'm not sure whether it's better.
>
> I agree the name is a bit cryptic. A related question is whether the
> optimization to skip the RCU wait should only apply to the update, or
> also to delete. I think it would make sense for it to apply to all
> operations. What do you think?
>
> I also realized the flag should technically apply to the *outer* map,
> since that's the map that's actually being modified and synced on, not
> the inner map. So I don't think "inner" should be part of the name in
> retrospect.

BPF_F_UPDATE_INNER_MAP_NO_WAIT suppose to mean update_OF_inner_map.

> Perhaps BPF_F_MAP_OF_MAPS_NO_WAIT or
> BPF_F_MAP_IN_MAP_NO_WAIT? I'm slightly leaning towards the latter
> because the map of maps support code is also located in map_in_map.c,
> so that matches nicely. They're both a bit long though. Either way,
> the definition of the outer map when using this flag would become
> something like:
>
> struct {
>     __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
>     __uint(max_entries, 4096);
>     __type(key, u64);
>     __type(value, u32);
>     __uint(map_flags, BPF_F_MAP_IN_MAP_NO_WAIT);
> } mapInMap SEC(".maps");

Actually we probably should make it similar to BPF_NOEXIST/ANY flag,
so it can be passed to update/delete command instead of
being a global property of the map.

BPF_F_MAP_IN_MAP_NO_WAIT name sounds fine.

But before we add the uapi flag please benchmark
synchronize_rcu_expedited() as Hou suggested.

> > Also have you considered a batched update? There will be
> > only one sync_rcu() for the whole batch.
>
> We have, yes, but in our case, these updates are a result of another
> system generating events, and it is a bit hard to batch them together:
> it would involve waiting for multiple events to arrive, instead of
> processing events as they come in, which introduces an additional
> layer of latency by itself.
>
> Regarding batched update, we've found that it is also very slow when
> inserting a large number of elements. In one example where we inserted
> ~1.2 million 16-byte elements, we found it took 4-500 milliseconds to
> update the map via batched update. This is due to the overhead of
> generic_map_update_batch() individually copying each element to be
> inserted from user space via copy_from_user(); almost all time is
> going to __check_object_size() called by copy_from_user(). I suspect
> this one is hard to fix though, due to how the elements in a map are
> laid out in memory; it would be hard to change such batched updates
> into a single copy. But perhaps that's something for a different
> thread (and we can easily work around it on our side).

Disable CONFIG_HARDENED_USERCOPY.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux