On Fri, 30 Jun 2023, Chuck Lever wrote: > Hi - > > Walking a linked list to find an idle thread is not CPU cache- > friendly, and in fact I've noted palpable per-request latency > impacts as the number of nfsd threads on the server increases. > > After discussing some possible improvements with Jeff at LSF/MM, > I've been experimenting with the following series. I've measured an > order of magnitude latency improvement in the thread lookup time, > and have managed to keep the whole thing lockless. > > The only thing I don't like is that allocating the idle bitmaps in > advance means we've got an /a priori/ cap on the number of NFSD > threads that can be created. I'd love to find a way to enable > the pool idle bitmaps to expand dynamically. Suggestions welcome. Hi Chuck, The series looks good. I did notice that patch 6/8 used UINT_MAX and U32_MAX in different places for the same number, though the next patch replaced them both for a new number - the same in both places now. I agree that an a priori cap on number of threads is not ideal. Have you considered using the xarray to only store busy threads? I think its lookup mechanism mostly relies on a bitmap of present entries, but I'm not completely sure. That would require some extra work for svc_stop_threads() which is the only place we are interested in threads that aren't busy. We would need to record a target number of threads, and whenever a thread becomes idle it checks if the number is exceeded. If so it exits decrementing the number of threads, other wise it re-inserts into the xa (if it cannot find a transport to handle). Alternately we could store bitmaps as values in an xarray, much like the ida code does. Thanks, NeilBrown