Hello, On Fri, Sep 08, 2023 at 01:26:11PM -0700, Josh Don wrote: > I'm writing BPF programs for scheduling (ie. sched_ext), so these are > getting invoked in hot paths and invoked concurrently across multiple > cpus (for example, pick_next_task, enqueue_task, etc.). The kernel is > responsible for relaying ground truth, userspace makes O(ms) > scheduling decisions, and BPF makes O(us) scheduling decisions. > BPF-BPF concurrency is possible with spinlocks and RMW, BPF-userspace > can currently only really use RMW. My line of questioning is more > forward looking, as I'm preemptively thinking of how to ensure > kernel-like scheduling performance, since BPF spinlock or RMW is > sometimes overkill :) I would think that barrier() and smp_mb() would > probably be the minimum viable set (at least for x86) that people > would find useful, but maybe others can chime in. My personal favorite set is store_release/load_acquire(). I have a hard time thinking up cases which can't be covered by them and they're basically free on x86. Thanks. -- tejun