Re: Poor performance of bpf_map_update_elem() for BPF_MAP_TYPE_HASH_OF_MAPS / BPF_MAP_TYPE_ARRAY_OF_MAPS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 8, 2025 at 5:15 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Sat, Feb 8, 2025 at 2:39 AM Ritesh Oedayrajsingh Varma
> <ritesh@xxxxxxxxxxxxxxx> wrote:
> >
> > On Sat, Feb 8, 2025 at 4:58 AM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Wed, Feb 5, 2025 at 4:58 AM Ritesh Oedayrajsingh Varma
> > > <ritesh@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > Given this, while it's not possible to remove the wait entirely
> > > > without breaking user space, I was wondering if it would be
> > > > possible/acceptable to add a way to opt-out of this behavior for
> > > > programs like ours that don't care about this. One way to do so could
> > > > be to add an additional flag to the BPF_MAP_CREATE flags, perhaps
> > > > something like BPF_F_INNER_MAP_NO_SYNC.
> > >
> > > Sounds reasonable.
> > > The flag name is a bit cryptic, BPF_F_UPDATE_INNER_MAP_NO_WAIT
> > > is a bit more explicit, but I'm not sure whether it's better.
> >
> > I agree the name is a bit cryptic. A related question is whether the
> > optimization to skip the RCU wait should only apply to the update, or
> > also to delete. I think it would make sense for it to apply to all
> > operations. What do you think?
> >
> > I also realized the flag should technically apply to the *outer* map,
> > since that's the map that's actually being modified and synced on, not
> > the inner map. So I don't think "inner" should be part of the name in
> > retrospect.
>
> BPF_F_UPDATE_INNER_MAP_NO_WAIT suppose to mean update_OF_inner_map.
>
> > Perhaps BPF_F_MAP_OF_MAPS_NO_WAIT or
> > BPF_F_MAP_IN_MAP_NO_WAIT? I'm slightly leaning towards the latter
> > because the map of maps support code is also located in map_in_map.c,
> > so that matches nicely. They're both a bit long though. Either way,
> > the definition of the outer map when using this flag would become
> > something like:
> >
> > struct {
> >     __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS);
> >     __uint(max_entries, 4096);
> >     __type(key, u64);
> >     __type(value, u32);
> >     __uint(map_flags, BPF_F_MAP_IN_MAP_NO_WAIT);
> > } mapInMap SEC(".maps");
>
> Actually we probably should make it similar to BPF_NOEXIST/ANY flag,
> so it can be passed to update/delete command instead of
> being a global property of the map.
>
> BPF_F_MAP_IN_MAP_NO_WAIT name sounds fine.

Thanks. Making it part of the update/delete flags makes sense. There
is already something similar in BPF_F_LOCK. Looking at the code, that
does mean it would probably make the most sense to perform the check
for such a flag outside of maybe_wait_bpf_programs() and make the
calls to maybe_wait_bpf_programs() conditional on the flag in
map_update_elem() / map_delete_elem() / bpf_map_do_batch().

>
> But before we add the uapi flag please benchmark
> synchronize_rcu_expedited() as Hou suggested.

I'll need to find some time for this, but will do.

>
> > > Also have you considered a batched update? There will be
> > > only one sync_rcu() for the whole batch.
> >
> > We have, yes, but in our case, these updates are a result of another
> > system generating events, and it is a bit hard to batch them together:
> > it would involve waiting for multiple events to arrive, instead of
> > processing events as they come in, which introduces an additional
> > layer of latency by itself.
> >
> > Regarding batched update, we've found that it is also very slow when
> > inserting a large number of elements. In one example where we inserted
> > ~1.2 million 16-byte elements, we found it took 4-500 milliseconds to
> > update the map via batched update. This is due to the overhead of
> > generic_map_update_batch() individually copying each element to be
> > inserted from user space via copy_from_user(); almost all time is
> > going to __check_object_size() called by copy_from_user(). I suspect
> > this one is hard to fix though, due to how the elements in a map are
> > laid out in memory; it would be hard to change such batched updates
> > into a single copy. But perhaps that's something for a different
> > thread (and we can easily work around it on our side).
>
> Disable CONFIG_HARDENED_USERCOPY.

This is part of an application we're building (a new CPU profiler)
that will be shipped to end users, where we have no control over the
flags their kernel is compiled with. Nevertheless, as mentioned we can
easily work around this by not inserting individual elements, but by
batching them together, thus reducing the copies to fewer-but-larger
copies. So this is not really a problem for us.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux