Re: Are BPF programs preemptible?

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 25 Jan 2023 18:22:34 -0800

On Wed, Jan 25, 2023 at 11:59 AM Yaniv Agman <yanivagman@xxxxxxxxx> wrote:
>
> > Anyway back to preempt_disable(). Think of it as a giant spin_lock
> > that covers the whole program. In preemptable kernels it hurts
> > tail latency and fairness, and is completely unacceptable in RT.
> > That's why we moved to migrate_disable.
> > Technically we can add bpf_preempt_disable() kfunc, but if we do that
> > we'll be back to square one. The issues with preemptions and RT
> > will reappear. So let's figure out a different solution.
> > Why not use a scratch buffer per program ?
>
> Totally understand the reason for avoiding preemption disable,
> especially in RT kernels.
> I believe the answer for why not to use a scratch buffer per program
> will simply be memory space.
> In our use-case, Tracee [1], we let the user choose whatever events to
> trace for a specific workload.
> This list of events is very big, and we have many BPF programs
> attached to different places of the kernel.
> Let's assume that we have 100 events, and for each event we have a
> different BPF program.
> Then having 32kb per-cpu scratch buffers translates to 3.2MB per one
> cpu, and ~100MB per 32 CPUs, which is more common for our case.

Well, 100 bpf progs consume at least a page each,
so you might want one program attached to all events.

> Since we always add new events to Tracee, this will also not be scalable.
> Yet, if there is no other solution, I believe we will go in that direction
>
> [1] https://github.com/aquasecurity/tracee/blob/main/pkg/ebpf/c/tracee.bpf.c

you're talking about BPF_PERCPU_ARRAY(scratch_map, scratch_t, 1); ?

Insead of scratch_map per program, use atomic per-cpu counter
for recursion.
You'll have 3 levels in the worst case.
So it will be:
BPF_PERCPU_ARRAY(scratch_map, scratch_t, 3);
On prog entry increment the recursion counter, on exit decrement.
And use that particular scratch_t in the prog.