> On Jul 3, 2023, at 10:02 PM, Chuck Lever <cel@xxxxxxxxxx> wrote: > > On Tue, Jul 04, 2023 at 11:26:22AM +1000, NeilBrown wrote: >> On Tue, 04 Jul 2023, Chuck Lever wrote: >>> From: Chuck Lever <chuck.lever@xxxxxxxxxx> >>> >>> I've noticed that client-observed server request latency goes up >>> simply when the nfsd thread count is increased. >>> >>> List walking is known to be memory-inefficient. On a busy server >>> with many threads, enqueuing a transport will walk the "all threads" >>> list quite frequently. This also pulls in the cache lines for some >>> hot fields in each svc_rqst (namely, rq_flags). >> >> I think this text could usefully be re-written. By this point in the >> series we aren't list walking. >> >> I'd also be curious to know what latency different you get for just this >> change. > > Not much of a latency difference at lower thread counts. > > The difference I notice is that with the spinlock version of > pool_wake_idle_thread, there is significant lock contention as > the thread count increases, and the throughput result of my fio > test is lower (outside the result variance). I mis-spoke. When I wrote this yesterday I had compared only the "xarray with bitmap" and the "xarray with spinlock" mechanisms. I had not tried "xarray only". Today, while testing review-related fixes, I benchmarked "xarray only". It behaves like the linked-list implementation it replaces: performance degrades with anything more than a couple dozen threads in the pool. -- Chuck Lever