On Tue, 7 Jan 2025 at 20:22, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Jan 07, 2025 at 05:59:50AM -0800, Kumar Kartikeya Dwivedi wrote: > > + if (val & _Q_LOCKED_MASK) { > > + RES_RESET_TIMEOUT(ts); > > + smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); > > + } > > Please check how smp_cond_load_acquire() works on ARM64 and then add > some words on how RES_CHECK_TIMEOUT() is still okay. Thanks Peter, The __cmpwait_relaxed bit does indeed look problematic, my understanding is that the ldxr + wfe sequence can get stuck because we may not have any updates on the &lock->locked address, and we’ll not call into RES_CHECK_TIMEOUT since that cond_expr check precedes the __cmpwait macro. I realized the sevl is just to not get stuck on the first wfe on entry, it won’t unblock other CPUs WFE, so things are incorrect as-is. In any case this is all too fragile to rely upon so it should be fixed. Do you have suggestions on resolving this? We want to invoke this macro as part of the waiting loop. We can have a rqspinlock_smp_cond_load_acquire that maps to no-WFE smp_load_acquire loop on arm64 and uses the asm-generic version elsewhere.