On Fri, Sep 8, 2023 at 1:43 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Thu, Sep 07, 2023 at 03:00:56PM -0700, Josh Don wrote: > > Has there been any further interest in supporting additional > > kernel-style atomics in BPF that you know of? > > This is one of the first that I have heard of. ;-) > > But what BPF programs are you running that are seeing excessive > synchronization overhead? That will tell us which operations to > start with. (Or maybe it is time to just add the full Linux-kernel > atomic-operations kitchen sink, but that would not normally be the way > to bet.) I'm writing BPF programs for scheduling (ie. sched_ext), so these are getting invoked in hot paths and invoked concurrently across multiple cpus (for example, pick_next_task, enqueue_task, etc.). The kernel is responsible for relaying ground truth, userspace makes O(ms) scheduling decisions, and BPF makes O(us) scheduling decisions. BPF-BPF concurrency is possible with spinlocks and RMW, BPF-userspace can currently only really use RMW. My line of questioning is more forward looking, as I'm preemptively thinking of how to ensure kernel-like scheduling performance, since BPF spinlock or RMW is sometimes overkill :) I would think that barrier() and smp_mb() would probably be the minimum viable set (at least for x86) that people would find useful, but maybe others can chime in. > > And on a different BPF note, one thing I wasn't sure about was the > > ability of the cpu to reorder loads and stores across the BPF program > > call boundary. For example, could the load of "z" in the BPF program > > below be reordered before the store to x in the kernel? I'm sure that > > no compiler barrier is ever necessary here since the BPF program is > > compiled separately from the kernel, but I'm not sure whether a > > hardware barrier is necessary. > > <kernel> > > x = 3 > > call_bpf(); > > <bpf> > > int y = z; > > Given that a major goal of BPF is the ability to add low-overhead > programs to code on fastpaths, I would not expect any implicit barriers > in that case. Consider for example counting the number of calls to a > "hot" function in the Linux kernel, in which case adding full ordering > would incur unacceptable performance degradation. I would instead > expect that the BPF program would need to add explicit barriers or > ordered RMW operations. Yep, that was my expectation as well. On the plus, this gives the flexibility of only adding barriers where they are really needed. Best, Josh