Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Fri, 26 Jan 2024 14:27:31 +0000

> On Jan 26, 2024, at 9:08 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> 
> On Fri, 2024-01-26 at 13:48 +0000, Chuck Lever III wrote:
>> 
>>> On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>>> 
>>> On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
>>>> On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
>>>>> On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
>>>>>> This is the last global stat, move it into nfsd_net and adjust all the
>>>>>> users to use that variant instead of the global one.
>>>>> 
>>>>> Hm. I thought nfsd threads were a global resource -- they service
>>>>> all network namespaces. So, shouldn't the same thread count be
>>>>> surfaced to all containers? Won't they all see all of the nfsd
>>>>> processes?
>>> 
>>> Each container is going to start /proc/fs/nfsd/threads number of threads
>>> regardless. I hadn't actually grokked that they just get tossed onto the
>>> pile of threads that service requests.
>>> 
>>> Is is possible for one container to start a small number of threads but
>>> have its client load be such that it spills over and ends up stealing
>>> threads from other containers?
>> 
>> I haven't seen any code that manages resources based on namespace,
>> except in filecache.c to restrict writeback per namespace.
>> 
>> My impression is that any nfsd thread can serve any namespace. I'm
>> not sure it is currently meaningful for a particular net namespace to
>> "create" more threads.
>> 
>> If someone would like that level of control, we could implement a
>> cgroup mechanism and have one or more separate svc_pools per net
>> namespace, maybe? </hand wave>
>> 
> 
> AFAICT, the total number of threads on the system will be the sum of the
> threads started in each of the containers. They do just go into a big
> pile, and whichever one wakes up will service the request, so the
> threads aren't associated with the netns, per-se. The svc_rqst's however
> _are_ per-netns. So, I don't see anything that ensures that a container
> doesn't exceed the number of threads it started on its own behalf.
> 
> <hand wave>
> I'm not sure we'd need to tie this in to cgroups.

Not a need, but cgroups are typically the way that such
resource constraints are managed, that's all.

> Now that Josef is
> moving some of these key structures to be per-net, it should be fairly
> simple to have nfsd() just look at the th_cnt and the thread count in
> the current namespace, and just enqueue the RPC rather than doing it?
> </hand wave>
> 
> OTOH, maybe I'm overly concerned here.
> 
> 
>> 
>>>> I don't think we want the network namespaces seeing how many threads exist in
>>>> the entire system right?
>> 
>> If someone in a non-init net namespace does a "pgrep -c nfsd" don't
>> they see the total nfsd thread count for the host?
>> 
> 
> Yes, they're kernel threads and they aren't associated with a particular
> pid namespace.
> 
>> 
>>>> Additionally it appears that we can have multiple threads per network namespace,
>>>> so it's not like this will just show 1 for each individual nn, it'll show
>>>> however many threads have been configured for that nfsd in that network
>>>> namespace.
>> 
>> I've never tried this, so I'm speculating. But it seems like for
>> now, because all nfsd threads can serve all namespaces, they should
>> all see the global thread count stat.
>> 
>> Then later we can refine it.
>> 
> 
> I don't think that info is particularly useful though, and it certainly
> breaks expectations wrt container isolation.
> 
> Look at it this way:
> 
> Suppose I have access to a container and I spin up nfsd with a
> particular number of threads. I now want to know "did I spin up enough
> threads?"

It makes sense to me for a container to ask for one or more
NFSD listeners. But I'm not yet clear on what it means for
a container to try to alter the NFSD thread count, which is
currently global.

> By making this per-namespace as Josef suggests it should be
> fairly simple to tell whether my clients are regularly overrunning the
> threads I started. With this info as global, I have no idea what netns
> the RPCs being counted are against. I can't do anything with that info.

That's true. I'm just not sure we need to do anything
significant about it as part of this patch series.

I'm not at all advocating leaving it like this. I agree it
needs updating.

For now, just fill in that thread count stat with the global
thread count, and later when we have a better concept for
what it means to "adjust the nfsd thread count from inside a
container" we can come back and make it make sense.

--
Chuck Lever