Re: [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Dec 20, 2021, at 2:52 PM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Mon, 2021-12-20 at 19:02 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Dec 20, 2021, at 1:35 PM, Trond Myklebust
>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>> 
>>> On Mon, 2021-12-20 at 15:51 +0000, Chuck Lever III wrote:
>>>> 
>>>> 
>>>>> On Dec 19, 2021, at 3:49 PM, Trond Myklebust
>>>>> <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> On Sun, 2021-12-19 at 18:15 +0000, Chuck Lever III wrote:
>>>>>> 
>>>>>>> On Dec 18, 2021, at 8:38 PM, trondmy@xxxxxxxxxx wrote:
>>>>>>> 
>>>>>>> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>>>>>>> 
>>>>>>> NFSv4 doesn't need rpcbind, so let's not refuse to start up
>>>>>>> just
>>>>>>> because
>>>>>>> the rpcbind registration failed.
>>>>>> 
>>>>>> Commit 7e55b59b2f32 ("SUNRPC/NFSD: Support a new option for
>>>>>> ignoring
>>>>>> the result of svc_register") added vs_rpcb_optnl, which is
>>>>>> already
>>>>>> set for nfsd4_version4. Is that not adequate?
>>>>>> 
>>>>> 
>>>>> The other issue is that under certain circumstances, you may
>>>>> also
>>>>> want
>>>>> to run NFSv3 without rpcbind support. For instance, when you
>>>>> have a
>>>>> knfsd server instance running as a data server, there is
>>>>> typically
>>>>> no
>>>>> need to run rpcbind.
>>>> 
>>>> So what you are saying is that you'd like this to be a run-time
>>>> setting
>>>> instead of a compile-time setting. Got it.
>>>> 
>>>> Note that this patch adds a non-container-aware administrative
>>>> API.
>>>> For
>>>> the same reasons I NAK'd 9/10, I'm going to NAK this one and ask
>>>> that
>>>> you provide a version that is container-aware (ideally: not a
>>>> module
>>>> parameter).
>>>> 
>>>> The new implementation should remove vs_rpcb_optnl, as a clean
>>>> up.
>>>> 
>>>> 
>>> 
>>> This is not something that turns off the registration with rpcbind.
>>> It
>>> is something that turns off the decision to abort knfsd if that
>>> registration fails.
>> 
>> See below, that's just what vs_rpcb_optnl does. It it's true,
>> then the result of the rpcbind registration for that service
>> is ignored.
>> 
>> 1039 int svc_generic_rpcbind_set(struct net *net,
>> 1040                             const struct svc_program *progp,
>> 1041                             u32 version, int family,
>> 1042                             unsigned short proto,
>> 1043                             unsigned short port)
>> 1044 {
>> 1045         const struct svc_version *vers = progp-
>>> pg_vers[version];
>> 1046         int error;
>> 1047 
>> 1048         if (vers == NULL)
>> 1049                 return 0;
>> 1050 
>> 1051         if (vers->vs_hidden) {
>> 1052                 trace_svc_noregister(progp->pg_name, version,
>> proto,
>> 1053                                      port, family, 0);
>> 1054                 return 0;
>> 1055         }
>> 1056 
>> 1057         /*
>> 1058          * Don't register a UDP port if we need congestion
>> 1059          * control.
>> 1060          */
>> 1061         if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP)
>> 1062                 return 0;
>> 1063 
>> 1064         error = svc_rpcbind_set_version(net, progp, version,
>> 1065                                         family, proto, port);
>> 1066 
>> 1067         return (vers->vs_rpcb_optnl) ? 0 : error;
>> 1068 }
>> 1069 EXPORT_SYMBOL_GPL(svc_generic_rpcbind_set);
>> 
>> And, it's already the case that vs_rpcb_optnl is true for our
>> NFSv4 server. So the issue is for NFSv3 only. It still looks
>> to me like the patch description for this patch, at the very
>> least, is not correct.
>> 
>> 
>>> That's not something that needs to be
>>> containerised: it's just common sense and really wants to be the
>>> default behaviour everywhere.
>> 
>> I'm not following this. I can imagine deployment cases where one
>> container might want to ensure rpcbind is running for NFSv3 while
>> another does not care. What am I missing?
>> 
> 
> Monitoring that rpcbind is working is not a kernel task. It's something
> that can and should be done in userspace. The kernel task should only
> be to try to register the ports that it is using to that rpcbind server
> if it is up and running.
> 
> In a containerised environment, then even the job of registering the
> ports is not necessarily desirable behaviour, since there will almost
> always be some additional form of NAT or address + port mapping going
> on that knfsd has no visibility into.

Today, the kernel fails nfsd start-up if it can't complete rpcbind
registration. If we want to change that, then the typical process for
this has been that the user space components to support it need to be
in place before we think of changing kernel behavior. Are they?

And, if knfsd is really not the place for checking the registration,
then the right answer is to remove the in-kernel check for everyone,
but only after a proper user space mechanism is in place.

I really don't want to have to carry a module parameter for this
transition in perpetuity. A better choice for managing the transition
would be a CONFIG option, which is relatively harmless to remove once
it is no longer needed.

I don't object to eventually removing the kernel registration check,
but I want to see the transition away from the current arrangement
done properly, please. At the very least we deserve to have some
dialog about the long-term direction.


--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux