Re: [PATCH v2 9/9] SUNRPC: Convert RQ_BUSY into a per-pool bitmap

"NeilBrown" <neilb@xxxxxxx> · Wed, 05 Jul 2023 10:35:27 +1000

On Wed, 05 Jul 2023, Chuck Lever III wrote:
> > On Jul 3, 2023, at 10:02 PM, Chuck Lever <cel@xxxxxxxxxx> wrote:
> > 
> > On Tue, Jul 04, 2023 at 11:26:22AM +1000, NeilBrown wrote:
> >> On Tue, 04 Jul 2023, Chuck Lever wrote:
> >>> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >>> 
> >>> I've noticed that client-observed server request latency goes up
> >>> simply when the nfsd thread count is increased.
> >>> 
> >>> List walking is known to be memory-inefficient. On a busy server
> >>> with many threads, enqueuing a transport will walk the "all threads"
> >>> list quite frequently. This also pulls in the cache lines for some
> >>> hot fields in each svc_rqst (namely, rq_flags).
> >> 
> >> I think this text could usefully be re-written.  By this point in the
> >> series we aren't list walking.
> >> 
> >> I'd also be curious to know what latency different you get for just this
> >> change.
> > 
> > Not much of a latency difference at lower thread counts.
> > 
> > The difference I notice is that with the spinlock version of
> > pool_wake_idle_thread, there is significant lock contention as
> > the thread count increases, and the throughput result of my fio
> > test is lower (outside the result variance).
> 
> I mis-spoke. When I wrote this yesterday I had compared only the
> "xarray with bitmap" and the "xarray with spinlock" mechanisms.
> I had not tried "xarray only".
> 
> Today, while testing review-related fixes, I benchmarked "xarray
> only". It behaves like the linked-list implementation it replaces:
> performance degrades with anything more than a couple dozen threads
> in the pool.

I'm a little surprised it is that bad, but only a little.
The above is good text to include in the justification of that last
patch.

Thanks for the clarification.
NeilBrown