Re: [PATCH v2 9/9] SUNRPC: Convert RQ_BUSY into a per-pool bitmap

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Tue, 4 Jul 2023 16:03:53 +0000

> On Jul 3, 2023, at 10:02 PM, Chuck Lever <cel@xxxxxxxxxx> wrote:
> 
> On Tue, Jul 04, 2023 at 11:26:22AM +1000, NeilBrown wrote:
>> On Tue, 04 Jul 2023, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>> 
>>> I've noticed that client-observed server request latency goes up
>>> simply when the nfsd thread count is increased.
>>> 
>>> List walking is known to be memory-inefficient. On a busy server
>>> with many threads, enqueuing a transport will walk the "all threads"
>>> list quite frequently. This also pulls in the cache lines for some
>>> hot fields in each svc_rqst (namely, rq_flags).
>> 
>> I think this text could usefully be re-written.  By this point in the
>> series we aren't list walking.
>> 
>> I'd also be curious to know what latency different you get for just this
>> change.
> 
> Not much of a latency difference at lower thread counts.
> 
> The difference I notice is that with the spinlock version of
> pool_wake_idle_thread, there is significant lock contention as
> the thread count increases, and the throughput result of my fio
> test is lower (outside the result variance).

I mis-spoke. When I wrote this yesterday I had compared only the
"xarray with bitmap" and the "xarray with spinlock" mechanisms.
I had not tried "xarray only".

Today, while testing review-related fixes, I benchmarked "xarray
only". It behaves like the linked-list implementation it replaces:
performance degrades with anything more than a couple dozen threads
in the pool.

--
Chuck Lever