Re: [PATCH v2] locking/rwbase: Prevent indefinite writer starvation

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Wed, 18 Jan 2023 16:25:57 +0100

On 2023-01-17 16:50:21 [+0000], Mel Gorman wrote:

> diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
> index c201aadb9301..99d81e8d1f25 100644
> --- a/kernel/locking/rwbase_rt.c
> +++ b/kernel/locking/rwbase_rt.c
> @@ -65,6 +69,64 @@ static __always_inline int rwbase_read_trylock(struct rwbase_rt *rwb)
>  	return 0;
>  }
>  
> +/*
> + * Allow reader bias with a pending writer for a minimum of 4ms or 1 tick.
> + * This matches RWSEM_WAIT_TIMEOUT for the generic RWSEM implementation.
> + * The granularity is not exact as the lowest bit in rwbase_rt->waiter_timeout
> + * is used to detect recent DL / RT tasks taking a read lock.
> + */
> +#define RWBASE_RT_WAIT_TIMEOUT DIV_ROUND_UP(HZ, 250)
> +
> +static void __sched update_dlrt_reader(struct rwbase_rt *rwb)
> +{
> +	/* No update required if DL / RT tasks already identified. */
> +	if (rwb->waiter_timeout & 1)
> +		return;
> +
> +	/*
> +	 * Record a DL / RT task acquiring the lock for read. This may result
> +	 * in indefinite writer starvation but DL / RT tasks should avoid such
> +	 * behaviour.
> +	 */
> +	if (rt_task(current)) {
> +		struct rt_mutex_base *rtm = &rwb->rtmutex;
> +		unsigned long flags;
> +
> +		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
> +		rwb->waiter_timeout |= 1;

Let me see of I parsed the whole logic right:

_After_ the RT reader acquired the lock, the lowest bit is set. This may
be immediately if the timeout did not occur yet.
With this flag set, all following reader incl. SCHED_OTHER will acquire
the lock.

If so, then I don't know why this is a good idea.

If _only_ the RT reader is allowed to acquire the lock while the writer
is waiting then it make sense to prefer the RT tasks. (So the check is
on current and not on the lowest bit).
All other (SCHED_OTHER) reader would have to block on the rtmutex after
the timeout. This makes sense to avoid the starvation.

If we drop that "we prefer the RT reader" then it would block on the
RTmutex. It will _still_ be preferred over the writer because it will be
enqueued before the writer in the queue due to its RT priority. The only
downside is that it has to wait until all readers are left.
So by allowing the RT reader to always acquire the lock as long as the
WRITER_BIAS isn't set, we would allow to enter early while the other
reader are still in and after the timeout you would only have RT reader
going in and out. All SCHED_OTHER reader block on the RTmutex.

I think I like this.

> +		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
> +	}
> +}
> +

Sebastian