On Fri, 17 Jun 2022 14:57:55 -0400 Waiman Long wrote: >On 6/17/22 13:45, Eric Dumazet wrote: >> On Fri, Jun 17, 2022 at 7:42 PM Waiman Long <longman@xxxxxxxxxx> wrote: >>> On 6/17/22 11:24, Eric Dumazet wrote: >>>> On Fri, Jun 17, 2022 at 5:00 PM Waiman Long <longman@xxxxxxxxxx> wrote: >>>>> On 6/17/22 10:57, Shakeel Butt wrote: >>>>>> On Fri, Jun 17, 2022 at 7:43 AM Waiman Long <longman@xxxxxxxxxx> wrote: >>>>>>> On 6/17/22 08:07, Peter Zijlstra wrote: >>>>>>>> On Fri, Jun 17, 2022 at 02:10:39AM -0700, Eric Dumazet wrote: >>>>>>>>> --- a/kernel/locking/qrwlock.c >>>>>>>>> +++ b/kernel/locking/qrwlock.c >>>>>>>>> @@ -23,16 +23,6 @@ void queued_read_lock_slowpath(struct qrwlock *lock) >>>>>>>>> /* >>>>>>>>> * Readers come here when they cannot get the lock without waiting >>>>>>>>> */ >>>>>>>>> - if (unlikely(in_interrupt())) { >>>>>>>>> - /* >>>>>>>>> - * Readers in interrupt context will get the lock immediately >>>>>>>>> - * if the writer is just waiting (not holding the lock yet), >>>>>>>>> - * so spin with ACQUIRE semantics until the lock is available >>>>>>>>> - * without waiting in the queue. >>>>>>>>> - */ >>>>>>>>> - atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); >>>>>>>>> - return; >>>>>>>>> - } >>>>>>>>> atomic_sub(_QR_BIAS, &lock->cnts); >>>>>>>>> >>>>>>>>> trace_contention_begin(lock, LCB_F_SPIN | LCB_F_READ); >>>>>>>> This is known to break tasklist_lock. >>>>>>>> >>>>>>> We certainly can't break the current usage of tasklist_lock. >>>>>>> >>>>>>> I am aware of this problem with networking code and is thinking about >>>>>>> either relaxing the check to exclude softirq or provide a >>>>>>> read_lock_unfair() variant for networking use. >>>>>> read_lock_unfair() for networking use or tasklist_lock use? >>>>> I mean to say read_lock_fair(), but it could also be the other way >>>>> around. Thanks for spotting that. >>>>> >>>> If only tasklist_lock is problematic and needs the unfair variant, >>>> then changing a few read_lock() for tasklist_lock will be less >>>> invasive than ~1000 read_lock() elsewhere.... >>> After a second thought, I think the right way is to introduce a fair >>> variant, if needed. If an arch isn't using qrwlock, the native rwlock >>> implementation will be unfair. In that sense, unfair rwlock is the >>> default. We will only need to change the relevant network read_lock() >>> calls to use the fair variant which will still be unfair if qrwlock >>> isn't used. We are not going to touch other read_lock call that don't >>> care about fair or unfair. >>> >> Hmm... backporting this kind of invasive change to stable kernels will >> be a daunting task. >> >> Were rwlocks always unfair, and we have been lucky ? >> > Yes, rwlocks was always unfair and it always had this kind of soft > lockup problem and scalability problem because of cacheline bouncing. > That was reason of creating qrwlock which can at least provide a fair > rwlock at task context. Now we have systems with more and more cpus and > that is the reason why you are seeing it all over again with the > networking code. No fair play without paying the price. If writer wants to play fair game with readers, it has to wait in queue for at least a tick then forces readers to go the slow path by setting _QW_WAITING. Only for thoughts now. Hillf +++ b/kernel/locking/qrwlock.c @@ -22,16 +22,20 @@ void queued_read_lock_slowpath(struct qr /* * Readers come here when they cannot get the lock without waiting */ + if (_QW_WAITING & atomic_read(&lock->cnts)) + goto queue; + if (unlikely(in_interrupt())) { /* * Readers in interrupt context will get the lock immediately - * if the writer is just waiting (not holding the lock yet), + * if the writer is not waiting long enough, * so spin with ACQUIRE semantics until the lock is available * without waiting in the queue. */ atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED)); return; } +queue: atomic_sub(_QR_BIAS, &lock->cnts); /* @@ -60,6 +64,8 @@ EXPORT_SYMBOL(queued_read_lock_slowpath) */ void queued_write_lock_slowpath(struct qrwlock *lock) { + unsigned long start; + int qw; int cnts; /* Put the writer into the wait queue */ @@ -70,12 +76,20 @@ void queued_write_lock_slowpath(struct q atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)) goto unlock; - /* Set the waiting flag to notify readers that a writer is pending */ - atomic_or(_QW_WAITING, &lock->cnts); - + start = jiffies; + qw = 0; /* When no more readers or writers, set the locked flag */ do { - cnts = atomic_cond_read_relaxed(&lock->cnts, VAL == _QW_WAITING); + if (qw == 0 && start + 2 < jiffies) { + qw = _QW_WAITING; + /* + * Set the waiting flag to notify readers that a writer is pending + * only after waiting long enough - that is the price writer pays + * for fairness + */ + atomic_or(_QW_WAITING, &lock->cnts); + } + cnts = atomic_cond_read_relaxed(&lock->cnts, VAL == qw); } while (!atomic_try_cmpxchg_acquire(&lock->cnts, &cnts, _QW_LOCKED)); unlock: arch_spin_unlock(&lock->wait_lock);