On Wed, 05 Jul 2023, Chuck Lever III wrote: > > On Jul 3, 2023, at 10:02 PM, Chuck Lever <cel@xxxxxxxxxx> wrote: > > > > On Tue, Jul 04, 2023 at 11:26:22AM +1000, NeilBrown wrote: > >> On Tue, 04 Jul 2023, Chuck Lever wrote: > >>> From: Chuck Lever <chuck.lever@xxxxxxxxxx> > >>> > >>> I've noticed that client-observed server request latency goes up > >>> simply when the nfsd thread count is increased. > >>> > >>> List walking is known to be memory-inefficient. On a busy server > >>> with many threads, enqueuing a transport will walk the "all threads" > >>> list quite frequently. This also pulls in the cache lines for some > >>> hot fields in each svc_rqst (namely, rq_flags). > >> > >> I think this text could usefully be re-written. By this point in the > >> series we aren't list walking. > >> > >> I'd also be curious to know what latency different you get for just this > >> change. > > > > Not much of a latency difference at lower thread counts. > > > > The difference I notice is that with the spinlock version of > > pool_wake_idle_thread, there is significant lock contention as > > the thread count increases, and the throughput result of my fio > > test is lower (outside the result variance). > > I mis-spoke. When I wrote this yesterday I had compared only the > "xarray with bitmap" and the "xarray with spinlock" mechanisms. > I had not tried "xarray only". > > Today, while testing review-related fixes, I benchmarked "xarray > only". It behaves like the linked-list implementation it replaces: > performance degrades with anything more than a couple dozen threads > in the pool. I'm a little surprised it is that bad, but only a little. The above is good text to include in the justification of that last patch. Thanks for the clarification. NeilBrown