On Tue, Jan 07, 2025 at 08:17:56PM +0100, Peter Zijlstra wrote: > On Tue, Jan 07, 2025 at 10:44:16PM +0530, Kumar Kartikeya Dwivedi wrote: > > On Tue, 7 Jan 2025 at 20:22, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > > > On Tue, Jan 07, 2025 at 05:59:50AM -0800, Kumar Kartikeya Dwivedi wrote: > > > > + if (val & _Q_LOCKED_MASK) { > > > > + RES_RESET_TIMEOUT(ts); > > > > + smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); > > > > + } > > > > > > Please check how smp_cond_load_acquire() works on ARM64 and then add > > > some words on how RES_CHECK_TIMEOUT() is still okay. > > > > Thanks Peter, > > > > The __cmpwait_relaxed bit does indeed look problematic, my > > understanding is that the ldxr + wfe sequence can get stuck because we > > may not have any updates on the &lock->locked address, and we’ll not > > call into RES_CHECK_TIMEOUT since that cond_expr check precedes the > > __cmpwait macro. > > IIRC the WFE will wake at least on every interrupt but might have an > inherent timeout itself, so it will make some progress, but not at a > speed comparable to a pure spin. > > > Do you have suggestions on resolving this? We want to invoke this > > macro as part of the waiting loop. We can have a > > rqspinlock_smp_cond_load_acquire that maps to no-WFE smp_load_acquire > > loop on arm64 and uses the asm-generic version elsewhere. > > That will make arm64 sad -- that wfe thing is how they get away with not > having paravirt spinlocks iirc. Also power consumption. > > I've not read well enough to remember what order of timeout you're > looking for, but you could have the tick sample the lock like a watchdog > like, and write a magic 'lock' value when it is deemed stuck. Oh, there is this thread: https://lkml.kernel.org/r/20241107190818.522639-1-ankur.a.arora@xxxxxxxxxx That seems to add exactly what you need -- with the caveat that the arm64 people will of course have to accept it first :-)