Re: [RFC PATCH] locking/rwbase: Prevent indefinite writer starvation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 09 Jan 2023, Peter Zijlstra wrote:

On Fri, Jan 06, 2023 at 02:27:43PM +0000, Mel Gorman wrote:
rw_semaphore and rwlock are explicitly unfair to writers in the presense
of readers by design with a PREEMPT_RT configuration. Commit 943f0edb754f
("locking/rt: Add base code for RT rw_semaphore and rwlock") notes;

	The implementation is writer unfair, as it is not feasible to do
	priority inheritance on multiple readers, but experience has shown
	that real-time workloads are not the typical workloads which are
	sensitive to writer starvation.

While atypical, it's also trivial to block writers with PREEMPT_RT
indefinitely without ever making forward progress. Since LTP-20220121,
the dio_truncate test case went from having 1 reader to having 16 readers
and the number of readers is sufficient to prevent the down_write ever
succeeding while readers exist. Ultimately the test is killed after 30
minutes as a failure.

dio_truncate is not a realtime application but indefinite writer starvation
is undesirable. The test case has one writer appending and truncating files
A and B while multiple readers read file A.  The readers and writer are
contending for one file's inode lock which never succeeds as the readers
keep reading until the writer is done which never happens.

This patch records a timestamp when the first writer is blocked. Reader
bias is allowed until the first writer has been blocked for a minimum of
4ms and a maximum of (4ms + 1 jiffie). The cutoff time is arbitrary on
the assumption that a hard realtime application missing a 4ms deadline
would not need PRREMPT_RT. It's expected that hard realtime applications
avoid such heavy reader/writer contention by design. On a test machine,
the test completed in 92 seconds.

 static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
				      unsigned int state)
 {
@@ -76,7 +79,8 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
	 * Allow readers, as long as the writer has not completely
	 * acquired the semaphore for write.
	 */
-	if (atomic_read(&rwb->readers) != WRITER_BIAS) {
+	if (atomic_read(&rwb->readers) != WRITER_BIAS &&
+	    jiffies - rwb->waiter_blocked < RW_CONTENTION_THRESHOLD) {
		atomic_inc(&rwb->readers);
		raw_spin_unlock_irq(&rtm->wait_lock);
		return 0;

Blergh.

So a number of comments:

- this deserves a giant comment, not only an obscure extra condition.

- this would be better if it were limited to only have effect
  when there are no RT/DL tasks involved.

Agreed.

(Sorry for hijacking this thread, also more Cc)

Hmm this reminds me of the epoll rwlock situation[1, 2] which does the lockless
ready event list updates from irq callback context and hits the writer unfair
scenario, which was designed really for tasklist_lock. Converting the read_lock
to RCU looks like a no-go because this is not a read-mostly pattern, far from
it actually. And in fact the read path is not at all a read path (ie: simply
traversing the list(s)). We also probably hit this unfair is good for throughput
condition mentioned by Linus as these are spinning locks and thus a short critical
region to really benefit from actual concurrent readers.

So while the numbers in a218cc491420 (epoll: use rwlock in order to reduce ep_poll
callback() contention) are very nice, based on the above and the fact that per
the changelog it does misasume the fairness I would vote for removing the lockless
stuff and return to simply using a spinlock (epoll is wacky enough already).
It is ultimately less burden on the kernel, and I suspect that people who really
care about epoll performance will mostly be looking at io_uring.

Thanks,
Davidlohr

[1] https://lore.kernel.org/all/20210825132754.GA895675@lothringen/
[2] https://lore.kernel.org/all/20220617091039.2257083-1-eric.dumazet@xxxxxxxxx/


This made me re-read the phase-fair rwlock paper and again note that RW
semaphore (eg blocking) variant was delayed to future work and AFAICT
this future hasn't happened yet :/

AFAICT it would still require boosting the readers (something tglx still
has nightmares of) and limiting reader concurrency, another thing that
hurts.





[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux