Hello, On Thu, Jul 13, 2023 at 11:32:37AM -0700, Linus Torvalds wrote: > On Thu, 13 Jul 2023 at 06:46, Andrea Righi <andrea.righi@xxxxxxxxxxxxx> wrote: > > > > I'm not sure if we already have an equivalent of > > smp_store_release_u64/smp_load_acquire_u64(). Otherwise, it may be worth > > to add them to a more generic place. > > Yeah, a 64-bit atomic load/store is not necessarily even possible on > 32-bit architectures. > > And when it *is* possible, it might be very very expensive indeed (eg > on 32-bit x86, the way to do a 64-bit load would be with "cmpxchg8b", > which is ridiculously slow) There are two places where sched_ext is depending on atomic load/store. One's this pnt_seq which is using smp_store_release/load_acquire(). The other is task_struct->scx.ops_state which uses atomic64_read_acquire() and atomic64_store_release(). atomic64's are implemented with spinlocks on 32bits by default which is probably why Andrea didn't hit it. pnt_seq is a per-cpu counter for successful pick_next_task's from sched_ext and used to tell "has at least one pick_next_task() succeeded after my kicking that CPU". p->scx_ops.state has embedded qseq counter (2bits for state flags, the rest for the counter. I gotta change the masks to macros too.) which is used to detect whether the task has been dequeued and re-enqueued between while a CPU is trying to double lock rq's for task migration. As both are used to detect races in very short and immediate time windows, using, respectively, 32bit and 30bit, should be safe practically. e.g. while it's theoretically possible for the task to be dequeued and re-enqueued exactly 2^30 times while the CPU is trying to switch rq locks, I don't think that's practically possible without something going very wrong with the machine (e.g. NMI / SMI). I'll note the above and change both to unsigned longs. Thanks. -- tejun