Re: BPF and lazy preemption.

Usama Saqib <usama.saqib@xxxxxxxxxxxxx> · Tue, 10 Dec 2024 14:25:20 +0100

[ Adding x86 / scheduler folks to Cc given PREEMPT_LAZY as-is would cause
  serious regressions for us. ]

On 11/18/24 10:14 AM, Usama Saqib wrote:
> Hello,
>
> I hope everyone is doing well. It seems that work has started to
> introduce a new preemption model in the linux kernel PREEMPT_LAZY [1].
> According to the mailing list, the maintainers intend for this to
> replace PREEMPT_NONE and PREEMPT_VOLUTARY as the default preemption
> model.
>
>  From the changeset, it looks like PREEMPT_LAZY allows
> irqentry_exit_cond_resched() to get called on IRQ exit. This change,
> similar to PREEMPT_FULL, can get two bpf programs attached to a kprobe
> or tracepoint running in user context, to nest. This currently causes
> the nesting program to miss. I have been able to get these misses to
> happen on top of this new patch.
>
> This behavior is currently not possible with the default preemption
> model used in most distributions, PREEMPT_VOLUNTARY. For many products
> using BPF for tracing/security, this would constitute a regression in
> terms of reliability.
>
> My question is whether there is any ongoing work to fix this behavior
> of kprobes and tracepoints, so they do not miss on nesting. I have
> previously been told that there is ongoing work related to
> bpf-specific spinlocks to resolve this problem [2]. Will that be
> available by the time this is merged into the mainline, and the
> current defaults deprecated?
>
> Thanks,
> Usama Saqib.
>
> 1. https://lwn.net/ml/all/20241007074609.447006177@xxxxxxxxxxxxx/
> 2. https://lore.kernel.org/bpf/CAOzX8ixsxPbw1ke=DsDd_b38k1TE+JRG3LvJfh4wD60mhHvAqA@xxxxxxxxxxxxxx/T/#m206e33e5a0a0d9d3d498480a53aa9c87c81d91ff

On Mon, Nov 18, 2024 at 10:14 AM Usama Saqib <usama.saqib@xxxxxxxxxxxxx> wrote:
>
> Hello,
>
> I hope everyone is doing well. It seems that work has started to
> introduce a new preemption model in the linux kernel PREEMPT_LAZY [1].
> According to the mailing list, the maintainers intend for this to
> replace PREEMPT_NONE and PREEMPT_VOLUTARY as the default preemption
> model.
>
> From the changeset, it looks like PREEMPT_LAZY allows
> irqentry_exit_cond_resched() to get called on IRQ exit. This change,
> similar to PREEMPT_FULL, can get two bpf programs attached to a kprobe
> or tracepoint running in user context, to nest. This currently causes
> the nesting program to miss. I have been able to get these misses to
> happen on top of this new patch.
>
> This behavior is currently not possible with the default preemption
> model used in most distributions, PREEMPT_VOLUNTARY. For many products
> using BPF for tracing/security, this would constitute a regression in
> terms of reliability.
>
> My question is whether there is any ongoing work to fix this behavior
> of kprobes and tracepoints, so they do not miss on nesting. I have
> previously been told that there is ongoing work related to
> bpf-specific spinlocks to resolve this problem [2]. Will that be
> available by the time this is merged into the mainline, and the
> current defaults deprecated?
>
> Thanks,
> Usama Saqib.
>
> 1. https://lwn.net/ml/all/20241007074609.447006177@xxxxxxxxxxxxx/
> 2. https://lore.kernel.org/bpf/CAOzX8ixsxPbw1ke=DsDd_b38k1TE+JRG3LvJfh4wD60mhHvAqA@xxxxxxxxxxxxxx/T/#m206e33e5a0a0d9d3d498480a53aa9c87c81d91ff