Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.

Christoph Lameter <cl@xxxxxxxxx> · Tue, 28 Jun 2022 15:57:54 +0200 (CEST)

On Mon, 27 Jun 2022, Alexei Starovoitov wrote:

> On Mon, Jun 27, 2022 at 5:17 PM Christoph Lameter <cl@xxxxxxxxx> wrote:
> >
> > > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> > >
> > > Introduce any context BPF specific memory allocator.
> > >
> > > Tracing BPF programs can attach to kprobe and fentry. Hence they
> > > run in unknown context where calling plain kmalloc() might not be safe.
> > > Front-end kmalloc() with per-cpu per-bucket cache of free elements.
> > > Refill this cache asynchronously from irq_work.
> >
> > GFP_ATOMIC etc is not going to work for you?
>
> slab_alloc_node->slab_alloc->local_lock_irqsave
> kprobe -> bpf prog -> slab_alloc_node -> deadlock.
> In other words, the slow path of slab allocator takes locks.

That is a relatively new feature due to RT logic support. without RT this
would be a simple irq disable.

Generally doing slab allocation  while debugging slab allocation is not
something that can work. Can we exempt RT locks/irqsave or slab alloc from
BPF tracing?

I would assume that other key items of kernel logic will have similar
issues.

> Which makes it unsafe to use from tracing bpf progs.
> That's why we preallocated all elements in bpf maps,
> so there are no calls to mm or rcu logic.
> bpf specific allocator cannot use locks at all.
> try_lock approach could have been used in alloc path,
> but free path cannot fail with try_lock.
> Hence the algorithm in this patch is purely lockless.
> bpf prog can attach to spin_unlock_irqrestore and
> safely do bpf_mem_alloc.

That is generally safe unless you get into reetrance issues with memory
allocation.

Which begs the question:

What happens if I try to use BPF to trace *your* shiny new memory
allocation functions in the BPF logic like bpf_mem_alloc? How do you stop
that from happening?