On Fri, 2024-01-26 at 13:48 +0000, Chuck Lever III wrote: > > > On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote: > > > On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote: > > > > On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote: > > > > > This is the last global stat, move it into nfsd_net and adjust all the > > > > > users to use that variant instead of the global one. > > > > > > > > Hm. I thought nfsd threads were a global resource -- they service > > > > all network namespaces. So, shouldn't the same thread count be > > > > surfaced to all containers? Won't they all see all of the nfsd > > > > processes? > > > > Each container is going to start /proc/fs/nfsd/threads number of threads > > regardless. I hadn't actually grokked that they just get tossed onto the > > pile of threads that service requests. > > > > Is is possible for one container to start a small number of threads but > > have its client load be such that it spills over and ends up stealing > > threads from other containers? > > I haven't seen any code that manages resources based on namespace, > except in filecache.c to restrict writeback per namespace. > > My impression is that any nfsd thread can serve any namespace. I'm > not sure it is currently meaningful for a particular net namespace to > "create" more threads. > > If someone would like that level of control, we could implement a > cgroup mechanism and have one or more separate svc_pools per net > namespace, maybe? </hand wave> > AFAICT, the total number of threads on the system will be the sum of the threads started in each of the containers. They do just go into a big pile, and whichever one wakes up will service the request, so the threads aren't associated with the netns, per-se. The svc_rqst's however _are_ per-netns. So, I don't see anything that ensures that a container doesn't exceed the number of threads it started on its own behalf. <hand wave> I'm not sure we'd need to tie this in to cgroups. Now that Josef is moving some of these key structures to be per-net, it should be fairly simple to have nfsd() just look at the th_cnt and the thread count in the current namespace, and just enqueue the RPC rather than doing it? </hand wave> OTOH, maybe I'm overly concerned here. > > > > I don't think we want the network namespaces seeing how many threads exist in > > > the entire system right? > > If someone in a non-init net namespace does a "pgrep -c nfsd" don't > they see the total nfsd thread count for the host? > Yes, they're kernel threads and they aren't associated with a particular pid namespace. > > > > Additionally it appears that we can have multiple threads per network namespace, > > > so it's not like this will just show 1 for each individual nn, it'll show > > > however many threads have been configured for that nfsd in that network > > > namespace. > > I've never tried this, so I'm speculating. But it seems like for > now, because all nfsd threads can serve all namespaces, they should > all see the global thread count stat. > > Then later we can refine it. > I don't think that info is particularly useful though, and it certainly breaks expectations wrt container isolation. Look at it this way: Suppose I have access to a container and I spin up nfsd with a particular number of threads. I now want to know "did I spin up enough threads?" By making this per-namespace as Josef suggests it should be fairly simple to tell whether my clients are regularly overrunning the threads I started. With this info as global, I have no idea what netns the RPCs being counted are against. I can't do anything with that info. > > > > I'm good either way, but it makes sense to me to only surface the network > > > namespace related thread count. I could probably have a global counter and only > > > surface the global counter if net == &init_net. Let me know what you prefer. > -- Jeff Layton <jlayton@xxxxxxxxxx>