On Wed, 8 Jan 2025 at 00:52, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Jan 07, 2025 at 08:17:56PM +0100, Peter Zijlstra wrote: > > On Tue, Jan 07, 2025 at 10:44:16PM +0530, Kumar Kartikeya Dwivedi wrote: > > > On Tue, 7 Jan 2025 at 20:22, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > > > > > On Tue, Jan 07, 2025 at 05:59:50AM -0800, Kumar Kartikeya Dwivedi wrote: > > > > > + if (val & _Q_LOCKED_MASK) { > > > > > + RES_RESET_TIMEOUT(ts); > > > > > + smp_cond_load_acquire(&lock->locked, !VAL || RES_CHECK_TIMEOUT(ts, ret)); > > > > > + } > > > > > > > > Please check how smp_cond_load_acquire() works on ARM64 and then add > > > > some words on how RES_CHECK_TIMEOUT() is still okay. > > > > > > Thanks Peter, > > > > > > The __cmpwait_relaxed bit does indeed look problematic, my > > > understanding is that the ldxr + wfe sequence can get stuck because we > > > may not have any updates on the &lock->locked address, and we’ll not > > > call into RES_CHECK_TIMEOUT since that cond_expr check precedes the > > > __cmpwait macro. > > > > IIRC the WFE will wake at least on every interrupt but might have an > > inherent timeout itself, so it will make some progress, but not at a > > speed comparable to a pure spin. Yes, also, it is possible to have interrupts disabled (e.g. for irqsave spin lock calls). > > > > > Do you have suggestions on resolving this? We want to invoke this > > > macro as part of the waiting loop. We can have a > > > rqspinlock_smp_cond_load_acquire that maps to no-WFE smp_load_acquire > > > loop on arm64 and uses the asm-generic version elsewhere. > > > > That will make arm64 sad -- that wfe thing is how they get away with not > > having paravirt spinlocks iirc. Also power consumption. > > Makes sense. > > I've not read well enough to remember what order of timeout you're > > looking for, but you could have the tick sample the lock like a watchdog > > like, and write a magic 'lock' value when it is deemed stuck. > > Oh, there is this thread: > > https://lkml.kernel.org/r/20241107190818.522639-1-ankur.a.arora@xxxxxxxxxx > > That seems to add exactly what you need -- with the caveat that the > arm64 people will of course have to accept it first :-) This seems perfect, thanks. While it adds a relaxed variant, it can be extended with an acquire variant as well. I will make use of this once it lands, it looks like it is pretty close. Until then I'm thinking that falling back to a non-WFE loop is the best course for now.