On Thu, Apr 15, 2021 at 04:37:58PM +0100, Catalin Marinas wrote: > On Thu, Apr 15, 2021 at 04:28:21PM +0100, Will Deacon wrote: > > On Thu, Apr 15, 2021 at 05:03:58PM +0200, Peter Zijlstra wrote: > > > diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c > > > index 4786dd271b45..22aeccc363ca 100644 > > > --- a/kernel/locking/qrwlock.c > > > +++ b/kernel/locking/qrwlock.c > > > @@ -60,6 +60,8 @@ EXPORT_SYMBOL(queued_read_lock_slowpath); > > > */ > > > void queued_write_lock_slowpath(struct qrwlock *lock) > > > { > > > + u32 cnt; > > > + > > > /* Put the writer into the wait queue */ > > > arch_spin_lock(&lock->wait_lock); > > > > > > @@ -73,9 +75,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock) > > > > > > /* When no more readers or writers, set the locked flag */ > > > do { > > > - atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING); > > > - } while (atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING, > > > - _QW_LOCKED) != _QW_WAITING); > > > + cnt = atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING); > > > > I think the issue is that >here< a concurrent reader in interrupt context > > can take the lock and release it again, but we could speculate reads from > > the critical section up over the later release and up before the control > > dependency here... > > > > > + } while (!atomic_try_cmpxchg_relaxed(&lock->cnts, &cnt, _QW_LOCKED)); > > > > ... and then this cmpxchg() will succeed, so our speculated stale reads > > could be used. > > > > *HOWEVER* > > > > Speculating a read should be fine in the face of a concurrent _reader_, > > so for this to be an issue it implies that the reader is also doing some > > (atomic?) updates. > > There's at least one such case: see chain_epi_lockless() updating > epi->next, called from ep_poll_callback() with a read_lock held. This > races with ep_done_scan() which has the write_lock held. Do you know if that's the code which triggered this patch? If so, it would be great to have this information in the changelog! > I think the authors of the above code interpreted the read_lock as > something that multiple threads can own disregarding the _read_ part. Using RmW atomics should be fine for that; it's no worse than some of the tricks pulled in RCU read context in the dentry cache (but then again, what is? ;) Will