Re: Are BPF programs preemptible?

Yonghong Song <yhs@xxxxxxxx> · Mon, 23 Jan 2023 09:02:02 -0800

On 1/23/23 4:30 AM, Yaniv Agman wrote:
Ok, thanks Jakub for the answer and references.
I must say that I am very surprised though. First, most of the
documentation for BPF says that preemption is disabled, like the
reference I gave [1] and even the bpf-helpers man page [2] says "Note
that all programs run with preemption disabled..." for the
bpf_get_smp_processor_id() helper. I think this is something that
deserves more attention since many eBPF developers are still under the
assumption that eBPF programs are non-preemptible, and running their
programs on newer kernels might be broken.

It would be great if you can send a patch to fix these
out-dated comments!

I'm trying to figure out how I can solve this issue in our case - is
it correct to assume that no more than one preemption can happen
during a run of my bpf program? If so, I can try to write a percpu

No. It is possible that you have more than one preemption during the
same prog run. There is no restriction on this.

buffer with 2 entries, and give the second entry to the program that
interrupted the first one. But even then, I will need to find a way to
know if my program currently interrupts the run of another program -
is there a way to do that? Maybe checking if the current context is of
an interrupt, can this be done? Any other suggestions to solve this
problem?

[1]: https://docs.cilium.io/en/latest/bpf/toolchain
[2]: https://man7.org/linux/man-pages/man7/bpf-helpers.7.html

Thanks,
Yaniv

‫בתאריך יום ב׳, 23 בינו׳ 2023 ב-12:54 מאת ‪Jakub Sitnicki‬‏
<‪jakub@xxxxxxxxxxxxxx‬‏>:‬

On Mon, Jan 23, 2023 at 11:21 AM +02, Yaniv Agman wrote:
Hello!

Several places state that eBPF programs cannot be preempted by the
kernel (e.g. https://docs.cilium.io/en/latest/bpf/toolchain ), however,
I did see a strange behavior where an eBPF percpu map gets overridden,
and I'm trying to figure out if it's due to a bug in my program or
some misunderstanding I have about eBPF. What caught my eye was a
sentence in a LWN article (https://lwn.net/Articles/812503/ ) that
says: "Alexei thankfully enlightened me recently over a beer that the
real intent here is to guarantee that the program runs to completion
on the same CPU where it started".

So my question is - are BPF programs guaranteed to run from start to
end without being interrupted at all or the only guarantee I get is
that they run on the same CPU but IRQs (NMIs, soft irqs, whatever) can
interrupt their run?

If the only guarantee is no migration, it means that a percpu map
cannot be safely used by two different BPF programs that can preempt
each other (e.g. some kprobe and a network cgroup program).

Since v5.7 BPF program runners use migrate_disable() instead of
preempt_disable(). See commit 2a916f2f546c ("bpf: Use
migrate_disable/enable in array macros and cgroup/lirc code.") [1].

But at that time migrate_disable() was merely an alias for
preempt_disable() on !CONFIG_PREEMPT_RT kernels.

Since v5.11 migrate_disable() does no longer disable preemption on
!CONFIG_PREEMPT_RT kernels. See commit 74d862b682f5 ("sched: Make
migrate_disable/enable() independent of RT") [2].

So, yes, you are right, but it depends on the kernel version.

PS. The migrate_disable vs per-CPU data problem is also covered in [3].

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a916f2f546ca1c1e3323e2a4269307f6d9890eb
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=74d862b682f51e45d25b95b1ecf212428a4967b0
[3]: https://lwn.net/Articles/836503/