On Mon, Sep 18, 2023 at 11:09:26AM -0400, Barret Rhoden wrote: > On 9/8/23 04:42, Paul E. McKenney wrote: > > But what BPF programs are you running that are seeing excessive > > synchronization overhead? That will tell us which operations to start > > with. (Or maybe it is time to just add the full Linux-kernel > > atomic-operations kitchen sink, but that would not normally be the way > > to bet.) > > Here's what I use in BPF, (also for writing parallel schedulers): > - READ_ONCE/WRITE_ONCE > - compiler atomic builtins, like CAS, swap/exchange, fetch_and_add, etc. > - smp_store_release, __atomic_load_n, etc. > - at one point, i was sprinkling asm volatile ("" ::: "memory") around too, > though not in any active code at the moment. Good to know, thank you very much!!! > My mental model, right or wrong, is that I am operating under something like > the LKMM, and that I need to convince the compiler to spit out the right > code (sort of like writing shared memory code to talk to a device or > userspace) and hope the JIT does the right thing. Just to make sure that I understand, the idea is to compile from (say) __atomic_load_n() to BPF instructions, correct? Or is this compiling all the way to the target x86/ARMv8/whatever machine instructions? Thanx, Paul