On Tue, 11 Jul 2023, Chuck Lever III wrote: > > > On Jul 10, 2023, at 6:29 PM, NeilBrown <neilb@xxxxxxx> wrote: > > > > On Tue, 11 Jul 2023, Chuck Lever wrote: > >> From: Chuck Lever <chuck.lever@xxxxxxxxxx> > >> > >> Measure a source of thread scheduling inefficiency -- count threads > >> that were awoken but found that the transport queue had already been > >> emptied. > >> > >> An empty transport queue is possible when threads that run between > >> the wake_up_process() call and the woken thread returning from the > >> scheduler have pulled all remaining work off the transport queue > >> using the first svc_xprt_dequeue() in svc_get_next_xprt(). > > > > I'm in two minds about this. The data being gathered here is > > potentially useful > > It's actually pretty shocking: I've measured more than > 15% of thread wake-ups find no work to do. That is a bigger number than I would have guessed! > > > > - but who it is useful to? > > I think it is primarily useful for us - to understand the behaviour of > > the implementation so we can know what needs to be optimised. > > It isn't really of any use to a sysadmin who wants to understand how > > their system is performing. > > > > But then .. who are tracepoints for? Developers or admins? > > I guess that fact that we feel free to modify them whenever we need > > means they are primarily for developers? In which case this is a good > > patch, and maybe we'll revert the functionality one day if it turns out > > that we can change the implementation so that a thread is never woken > > when there is no work to do .... > > A reasonable question to ask. The new "starved" metric > is similar: possibly useful while we are developing the > code, but not particularly valuable for system > administrators. > > How are the pool_stats used by administrators? That is a fair question. Probably not much. Once upon a time we had stats which could show a histogram how thread usage. I used that to decided if the load justified more threads. But we removed it because it was somewhat expensive and it was argued it may not be all that useful... I haven't really looked at any other stats in my work. Almost always it is a packet capture that helps me see what is happening when I have an issue to address. Maybe I should just accept that stats are primarily for developers and they can be incredible useful for that purpose, and not worry if admins might ever need them. > > (And, why are they in /proc/fs/nfsd/ rather than under > something RPC-related?) Maybe because we "owned" /proc/fs/nfsd/, but the RPC-related stuff is under "net" and we didn't feel so comfortable sticking new stuff there. Or maybe not. Thanks, NeilBrown