BPF memory model

Josh Don <joshdon@xxxxxxxxxx> · Thu, 7 Sep 2023 15:00:56 -0700

Hi Paul,

I was chatting with Dave Marchevsky about the BPF memory model, and
had some followup questions you might be able to answer.

I've been using the built-in RMW operations to do a lot of lockless
programming, for concurrent BPF-BPF, but also especially for
userspace-BPF (the latter of which has become a lot more interesting
with the sched_ext work from Meta). It would of course be nice to
sometimes lower the synchronization overhead to a hardware barrier or
a compiler barrier, to allow for general use acquire/release semantics
(rather than needing to fall back to a lock RMW instruction). I saw
your presentation from 2021 on this topic here:
https://lpc.events/event/11/contributions/941/attachments/859/1667/bpf-memory-model.2020.09.22a.pdf

Has there been any further interest in supporting additional
kernel-style atomics in BPF that you know of?

And on a different BPF note, one thing I wasn't sure about was the
ability of the cpu to reorder loads and stores across the BPF program
call boundary. For example, could the load of "z" in the BPF program
below be reordered before the store to x in the kernel? I'm sure that
no compiler barrier is ever necessary here since the BPF program is
compiled separately from the kernel, but I'm not sure whether a
hardware barrier is necessary.
<kernel>
x = 3
call_bpf();
  <bpf>
  int y = z;

Best,
Josh