Re: [PATCH v3 5/9] SUNRPC: Count pool threads that were awoken but found no work to do

"NeilBrown" <neilb@xxxxxxx> · Tue, 11 Jul 2023 19:54:53 +1000

On Tue, 11 Jul 2023, Chuck Lever III wrote:
> 
> > On Jul 10, 2023, at 6:29 PM, NeilBrown <neilb@xxxxxxx> wrote:
> > 
> > On Tue, 11 Jul 2023, Chuck Lever wrote:
> >> From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >> 
> >> Measure a source of thread scheduling inefficiency -- count threads
> >> that were awoken but found that the transport queue had already been
> >> emptied.
> >> 
> >> An empty transport queue is possible when threads that run between
> >> the wake_up_process() call and the woken thread returning from the
> >> scheduler have pulled all remaining work off the transport queue
> >> using the first svc_xprt_dequeue() in svc_get_next_xprt().
> > 
> > I'm in two minds about this.  The data being gathered here is
> > potentially useful
> 
> It's actually pretty shocking: I've measured more than
> 15% of thread wake-ups find no work to do.

That is a bigger number than I would have guessed!

> 
> 
> > - but who it is useful to?
> > I think it is primarily useful for us - to understand the behaviour of
> > the implementation so we can know what needs to be optimised.
> > It isn't really of any use to a sysadmin who wants to understand how
> > their system is performing.
> > 
> > But then .. who are tracepoints for?  Developers or admins?
> > I guess that fact that we feel free to modify them whenever we need
> > means they are primarily for developers?  In which case this is a good
> > patch, and maybe we'll revert the functionality one day if it turns out
> > that we can change the implementation so that a thread is never woken
> > when there is no work to do ....
> 
> A reasonable question to ask. The new "starved" metric
> is similar: possibly useful while we are developing the
> code, but not particularly valuable for system
> administrators.
> 
> How are the pool_stats used by administrators?

That is a fair question.  Probably not much.
Once upon a time we had stats which could show a histogram how thread
usage.  I used that to decided if the load justified more threads.
But we removed it because it was somewhat expensive and it was argued it
may not be all that useful...
I haven't really looked at any other stats in my work.  Almost always it
is a packet capture that helps me see what is happening when I have an
issue to address.

Maybe I should just accept that stats are primarily for developers and
they can be incredible useful for that purpose, and not worry if admins
might ever need them.

> 
> (And, why are they in /proc/fs/nfsd/ rather than under
> something RPC-related?)

Maybe because we "owned" /proc/fs/nfsd/, but the RPC-related stuff is
under "net" and we didn't feel so comfortable sticking new stuff there.
Or maybe not.

Thanks,
NeilBrown