On Tue, Dec 10, 2024 at 02:25:20PM +0100, Usama Saqib wrote: > [ Adding x86 / scheduler folks to Cc given PREEMPT_LAZY as-is would cause > serious regressions for us. ] > > On 11/18/24 10:14 AM, Usama Saqib wrote: > > Hello, > > > > I hope everyone is doing well. It seems that work has started to > > introduce a new preemption model in the linux kernel PREEMPT_LAZY [1]. > > According to the mailing list, the maintainers intend for this to > > replace PREEMPT_NONE and PREEMPT_VOLUTARY as the default preemption > > model. > > > > From the changeset, it looks like PREEMPT_LAZY allows > > irqentry_exit_cond_resched() to get called on IRQ exit. This change, > > similar to PREEMPT_FULL, can get two bpf programs attached to a kprobe > > or tracepoint running in user context, to nest. This currently causes > > the nesting program to miss. I have been able to get these misses to > > happen on top of this new patch. > > > > This behavior is currently not possible with the default preemption > > model used in most distributions, PREEMPT_VOLUNTARY. For many products > > using BPF for tracing/security, this would constitute a regression in > > terms of reliability. > > > > My question is whether there is any ongoing work to fix this behavior > > of kprobes and tracepoints, so they do not miss on nesting. I have > > previously been told that there is ongoing work related to > > bpf-specific spinlocks to resolve this problem [2]. Will that be > > available by the time this is merged into the mainline, and the > > current defaults deprecated? I have no idea about the whole BPF thing, but if behaviour is as PREEMPT_FULL, then there is nothing to fix from a scheduler PoV. Note that most distros already build with PREEMPT_DYNAMIC, which allows users/admins to dynamically select the preemption model (either at boot or at runtime through debugfs). If certain BPF stuff cannot deal with full preemption, then I would have to call it broken.