Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 26 Jan 2024 09:08:28 -0500

On Fri, 2024-01-26 at 13:48 +0000, Chuck Lever III wrote:
> 
> > On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > 
> > On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
> > > On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
> > > > On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
> > > > > This is the last global stat, move it into nfsd_net and adjust all the
> > > > > users to use that variant instead of the global one.
> > > > 
> > > > Hm. I thought nfsd threads were a global resource -- they service
> > > > all network namespaces. So, shouldn't the same thread count be
> > > > surfaced to all containers? Won't they all see all of the nfsd
> > > > processes?
> > 
> > Each container is going to start /proc/fs/nfsd/threads number of threads
> > regardless. I hadn't actually grokked that they just get tossed onto the
> > pile of threads that service requests.
> > 
> > Is is possible for one container to start a small number of threads but
> > have its client load be such that it spills over and ends up stealing
> > threads from other containers?
> 
> I haven't seen any code that manages resources based on namespace,
> except in filecache.c to restrict writeback per namespace.
> 
> My impression is that any nfsd thread can serve any namespace. I'm
> not sure it is currently meaningful for a particular net namespace to
> "create" more threads.
> 
> If someone would like that level of control, we could implement a
> cgroup mechanism and have one or more separate svc_pools per net
> namespace, maybe? </hand wave>
> 

AFAICT, the total number of threads on the system will be the sum of the
threads started in each of the containers. They do just go into a big
pile, and whichever one wakes up will service the request, so the
threads aren't associated with the netns, per-se. The svc_rqst's however
_are_ per-netns. So, I don't see anything that ensures that a container
doesn't exceed the number of threads it started on its own behalf.

<hand wave>
I'm not sure we'd need to tie this in to cgroups. Now that Josef is
moving some of these key structures to be per-net, it should be fairly
simple to have nfsd() just look at the th_cnt and the thread count in
the current namespace, and just enqueue the RPC rather than doing it?
</hand wave>

OTOH, maybe I'm overly concerned here.

> 
> > > I don't think we want the network namespaces seeing how many threads exist in
> > > the entire system right?
> 
> If someone in a non-init net namespace does a "pgrep -c nfsd" don't
> they see the total nfsd thread count for the host?
> 

Yes, they're kernel threads and they aren't associated with a particular
pid namespace.

> 
> > > Additionally it appears that we can have multiple threads per network namespace,
> > > so it's not like this will just show 1 for each individual nn, it'll show
> > > however many threads have been configured for that nfsd in that network
> > > namespace.
> 
> I've never tried this, so I'm speculating. But it seems like for
> now, because all nfsd threads can serve all namespaces, they should
> all see the global thread count stat.
> 
> Then later we can refine it.
> 

I don't think that info is particularly useful though, and it certainly
breaks expectations wrt container isolation.

Look at it this way:

Suppose I have access to a container and I spin up nfsd with a
particular number of threads. I now want to know "did I spin up enough
threads?" By making this per-namespace as Josef suggests it should be
fairly simple to tell whether my clients are regularly overrunning the
threads I started. With this info as global, I have no idea what netns
the RPCs being counted are against. I can't do anything with that info.

> 
> > > I'm good either way, but it makes sense to me to only surface the network
> > > namespace related thread count.  I could probably have a global counter and only
> > > surface the global counter if net == &init_net.  Let me know what you prefer.
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>