Re: [PATCH bpf v1 2/3] bpf: Don't reinit map value in prealloc_lru_pop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 9 Aug 2022 at 05:18, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Aug 8, 2022 at 5:25 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
> > >
> > > Thinking again. I guess the following scenario is possible:
> > >
> > >       rcu_read_lock()
> > >          v = bpf_map_lookup_elem(&key);
> > >          t1 = v->field;
> > >          bpf_map_delete_elem(&key);
> > >             /* another bpf program triggering bpf_map_update_elem() */
> > >             /* the element with 'key' is reused */
> > >             /* the value is updated */
> > >          t2 = v->field;
> > >          ...
> > >       rcu_read_unlock()
> > >
> > > it is possible t1 and t2 may not be the same.
> > > This should be an extremely corner case, not sure how to resolve
> > > this easily without performance degradation.
> >
> > Yes, this is totally possible. This extreme corner case may also
> > become a real problem in sleepable programs ;-).
> >
> > I had an idea in mind on how it can be done completely in the BPF
> > program side without changing the map implementation.
>
> +1 it's best to address it inside the program instead
> of complicating and likely slowing down maps.
>
> As far as sleepable progs.. the issue is not present
> because sleepable progs are only allowed to use preallocated maps.

The problem being discussed in this thread is exactly with
preallocated maps, so I do think the issue is present with sleepable
progs. I.e. it can do a lookup, and go to sleep (implicitly or
explicitly), and then find the element has already been reused (i.e.
the window is wider so it is more likely). It's not a smart thing to
do ofcourse, but my point was that compared to the non-sleepable case
it won't be a corner case anymore that you might not hit very often.

> In non-prealloc maps free_htab_elem() would just call_rcu()
> and the element would be freed into the global slab which
> will be bad if bpf prog is doing something like above.
>
> Doing call_rcu_tasks_trace() + call_rcu() would allow
> non-prealloc in sleepable, but it doesn't scale
> and non-prealloc is already many times slower comparing
> to prealloc due to atomic_inc/dec and call_rcu.

Exactly, which means the element will be readily reused in the
sleepable case since it only uses preallocated maps, which takes us
back to the problem above.

>
> With new bpf_mem_alloc I'm experimenting with batching
> of call_rcu_tasks_trace + call_rcu, so hash map can be
> dynamically allocated, just as fast as full prealloc and
> finally safe in both sleepable and traditional progs.
> I was thinking of adding a new SLAB_TYPESAFE_BY_RCU_TASKS_TRACE
> flag to the slab, but decided to go that route only if
> batching of call_rcu*() won't be sufficient.

Interesting stuff, looking forward to it!

> Sorry it takes so long to get the patches ready.
> Summer in WA is the best time to take vacation :)
> I've been taking plenty.

Well, I think this thread was just me and Yonghong contemplating other
possible fixes to the original bug, so there's no rush, enjoy
yourself! :).



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux