On Wed, Jan 25, 2023 at 11:59 AM Yaniv Agman <yanivagman@xxxxxxxxx> wrote: > > > Anyway back to preempt_disable(). Think of it as a giant spin_lock > > that covers the whole program. In preemptable kernels it hurts > > tail latency and fairness, and is completely unacceptable in RT. > > That's why we moved to migrate_disable. > > Technically we can add bpf_preempt_disable() kfunc, but if we do that > > we'll be back to square one. The issues with preemptions and RT > > will reappear. So let's figure out a different solution. > > Why not use a scratch buffer per program ? > > Totally understand the reason for avoiding preemption disable, > especially in RT kernels. > I believe the answer for why not to use a scratch buffer per program > will simply be memory space. > In our use-case, Tracee [1], we let the user choose whatever events to > trace for a specific workload. > This list of events is very big, and we have many BPF programs > attached to different places of the kernel. > Let's assume that we have 100 events, and for each event we have a > different BPF program. > Then having 32kb per-cpu scratch buffers translates to 3.2MB per one > cpu, and ~100MB per 32 CPUs, which is more common for our case. Well, 100 bpf progs consume at least a page each, so you might want one program attached to all events. > Since we always add new events to Tracee, this will also not be scalable. > Yet, if there is no other solution, I believe we will go in that direction > > [1] https://github.com/aquasecurity/tracee/blob/main/pkg/ebpf/c/tracee.bpf.c you're talking about BPF_PERCPU_ARRAY(scratch_map, scratch_t, 1); ? Insead of scratch_map per program, use atomic per-cpu counter for recursion. You'll have 3 levels in the worst case. So it will be: BPF_PERCPU_ARRAY(scratch_map, scratch_t, 3); On prog entry increment the recursion counter, on exit decrement. And use that particular scratch_t in the prog.