Re: [PATCH v2 bpf-next 01/12] bpf: Introduce any context BPF specific memory allocator.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 18, 2022 at 01:51:39AM +0200, Kumar Kartikeya Dwivedi wrote:
> > +
> > +/* notrace is necessary here and in other functions to make sure
> > + * bpf programs cannot attach to them and cause llist corruptions.
> > + */
> > +static void notrace *unit_alloc(struct bpf_mem_cache *c)
> > +{
> > +       bool in_nmi = bpf_in_nmi();
> > +       struct llist_node *llnode;
> > +       unsigned long flags;
> > +       int cnt = 0;
> > +
> > +       if (unlikely(in_nmi)) {
> > +               llnode = llist_del_first(&c->free_llist_nmi);
> > +               if (llnode)
> > +                       cnt = atomic_dec_return(&c->free_cnt_nmi);
> 
> I am trying to understand which case this
> atomic_dec_return/atomic_inc_return protects against in the
> unit_alloc/unit_free for in_nmi branch. Is it protecting nested NMI
> BPF prog interrupting NMI prog?
> 
> In case of perf it seems we use bpf_prog_active, 

yes, but bpf_prog_active has plenty of downsides and hopefully
will be replaced eventually with cleaner mechanism.
Separate topic.

> so nested NMI prog
> won't be invoked while we are interrupted inside a BPF program in NMI
> context. Which are the other cases that might cause reentrancy in this
> branch such that we need atomics instead of c->free_cnt_nmi--? Or are
> you anticipating you might allow this in the future even if it is
> disallowed for now?
> 
> If programs are allowed to stack like this, and we try to reason about
> the safety of llist_del_first operation, the code is:
> 
> struct llist_node *llist_del_first(struct llist_head *head)
> {
>      struct llist_node *entry, *old_entry, *next;
> 
>      entry = smp_load_acquire(&head->first);
>      for (;;) {
>          if (entry == NULL)
>              return NULL;
>          old_entry = entry;
>          next = READ_ONCE(entry->next);
> >>>>>>>> Suppose nested NMI comes at this point and BPF prog is invoked.

llist_del_first is notrace.
unit_alloc() above is also notrace. See comment before it.
perf event overflow NMI can happen here, but for some other llist.
Hence we're talking about NMI issues only here. fentry progs do not apply here.

> Assume the current nmi free llist is HEAD -> A -> B -> C -> D -> ...
> For our cmpxchg, parameters are going to be cmpxchg(&head->first, A, B);
> 
> Now, nested NMI prog does unit_alloc thrice. this does llist_del_first thrice

Even double llist_del_first on the same llist is bad. That's a known fact.

> This makes nmi free llist HEAD -> D -> ...
> A, B, C are allocated in prog.
> Now it does unit_free of all three, but in order of B, C, A.
> unit_free does llist_add, nmi free llist becomes HEAD -> A -> C -> B -> D -> ...
> 
> Nested NMI prog exits.
> We continue with our cmpxchg(&head->first, A, B); It succeeds, A is
> returned, but C will be leaked.

This exact scenario cannot happen for bpf_mem_cache's freelist.
unit_alloc is doing llist_del_first on per-cpu freelist.
We can have two perf_event bpf progs. Both progs would
share the same hash map and use the same struct bpf_mem_alloc,
and both call unit_alloc() on the same cpu freelist,
but as you noticed bpf_prog_active covers that case.
bpf_prog_active is too coarse as we discussed in the other thread a
month or so ago. It prevents valid and safe execution of bpf progs, lost
events, etc. We will surely come up with a better mechanism.

Going back to your earlier question:

> Which are the other cases that might cause reentrancy in this
> branch such that we need atomics instead of c->free_cnt_nmi--?

It's the case where perf_event bpf prog happened inside bpf_mem_refill in irq_work.
bpf_mem_refill manipulates free_cnt_nmi and nmi bpf prog too through unit_alloc.
Which got me thinking that there is indeed a missing check here.
We need to protect free_bulk_nmi's llist_del_first from unit_alloc's llist_del_first.
bpf_prog_active could be used for that, but let's come up with a cleaner way.
Probably going to add atomic_t flag to bpf_mem_cache and cmpxchg it,
or lock and spin_trylock it. tbd.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux