Re: [PATCH 27/34] sched_ext: Implement SCX_KICK_WAIT

Tejun Heo <tj@xxxxxxxxxx> · Thu, 13 Jul 2023 09:48:04 -1000

Hello,

On Thu, Jul 13, 2023 at 11:32:37AM -0700, Linus Torvalds wrote:
> On Thu, 13 Jul 2023 at 06:46, Andrea Righi <andrea.righi@xxxxxxxxxxxxx> wrote:
> >
> > I'm not sure if we already have an equivalent of
> > smp_store_release_u64/smp_load_acquire_u64(). Otherwise, it may be worth
> > to add them to a more generic place.
> 
> Yeah, a 64-bit atomic load/store is not necessarily even possible on
> 32-bit architectures.
> 
> And when it *is* possible, it might be very very expensive indeed (eg
> on 32-bit x86, the way to do a 64-bit load would be with "cmpxchg8b",
> which is ridiculously slow)

There are two places where sched_ext is depending on atomic load/store.
One's this pnt_seq which is using smp_store_release/load_acquire(). The
other is task_struct->scx.ops_state which uses atomic64_read_acquire() and
atomic64_store_release(). atomic64's are implemented with spinlocks on
32bits by default which is probably why Andrea didn't hit it.

pnt_seq is a per-cpu counter for successful pick_next_task's from sched_ext
and used to tell "has at least one pick_next_task() succeeded after my
kicking that CPU".

p->scx_ops.state has embedded qseq counter (2bits for state flags, the rest
for the counter. I gotta change the masks to macros too.) which is used to
detect whether the task has been dequeued and re-enqueued between while a
CPU is trying to double lock rq's for task migration.

As both are used to detect races in very short and immediate time windows,
using, respectively, 32bit and 30bit, should be safe practically. e.g. while
it's theoretically possible for the task to be dequeued and re-enqueued
exactly 2^30 times while the CPU is trying to switch rq locks, I don't think
that's practically possible without something going very wrong with the
machine (e.g. NMI / SMI).

I'll note the above and change both to unsigned longs.

Thanks.

-- 
tejun