Re: [nfs-utils PATCH] nfsdctl: debug logging fixups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2025-01-16 at 16:00 -0500, Steve Dickson wrote:
> 
> On 1/16/25 6:50 AM, Jeff Layton wrote:
> > On Wed, 2025-01-15 at 15:53 -0500, Steve Dickson wrote:
> > > 
> > > On 1/15/25 1:33 PM, Jeff Layton wrote:
> > > > On Wed, 2025-01-15 at 12:47 -0500, Steve Dickson wrote:
> > > > > 
> > > > > On 1/15/25 12:35 PM, Jeff Layton wrote:
> > > > > > On Wed, 2025-01-15 at 12:32 -0500, Steve Dickson wrote:
> > > > > > > 
> > > > > > > On 1/15/25 12:00 PM, Scott Mayhew wrote:
> > > > > > > > Move read_nfsd_conf() out of autostart_func() and into main().  Remove
> > > > > > > > hard-coded NFSD_FAMILY_NAME in the first error message in
> > > > > > > > netlink_msg_alloc() and make the error messages in netlink_msg_alloc()
> > > > > > > > more descriptive/unique.
> > > > > > > > 
> > > > > > > > Signed-off-by: Scott Mayhew <smayhew@xxxxxxxxxx>
> > > > > > > > ---
> > > > > > > > SteveD - this would go on top of Jeff's "nfsdctl: add support for new
> > > > > > > > lockd configuration interface" patches.
> > > > > > > Got it...
> > > > > > > 
> > > > > > > > 
> > > > > > > >      utils/nfsdctl/nfsdctl.c | 8 ++++----
> > > > > > > >      1 file changed, 4 insertions(+), 4 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/utils/nfsdctl/nfsdctl.c b/utils/nfsdctl/nfsdctl.c
> > > > > > > > index 003daba5..f81c78ae 100644
> > > > > > > > --- a/utils/nfsdctl/nfsdctl.c
> > > > > > > > +++ b/utils/nfsdctl/nfsdctl.c
> > > > > > > > @@ -436,7 +436,7 @@ static struct nl_msg *netlink_msg_alloc(struct nl_sock *sock, const char *family
> > > > > > > >      
> > > > > > > >      	id = genl_ctrl_resolve(sock, family);
> > > > > > > >      	if (id < 0) {
> > > > > > > > -		xlog(L_ERROR, "%s not found", NFSD_FAMILY_NAME);
> > > > > > > > +		xlog(L_ERROR, "failed to resolve %s generic netlink family", family);
> > > > > > > >      		return NULL;
> > > > > > > >      	}
> > > > > > > >      
> > > > > > > > @@ -447,7 +447,7 @@ static struct nl_msg *netlink_msg_alloc(struct nl_sock *sock, const char *family
> > > > > > > >      	}
> > > > > > > >      
> > > > > > > >      	if (!genlmsg_put(msg, 0, 0, id, 0, 0, 0, 0)) {
> > > > > > > > -		xlog(L_ERROR, "failed to allocate netlink message");
> > > > > > > > +		xlog(L_ERROR, "failed to add generic netlink headers to netlink message");
> > > > > > > >      		nlmsg_free(msg);
> > > > > > > >      		return NULL;
> > > > > > > >      	}
> > > > > > > > @@ -1509,8 +1509,6 @@ static int autostart_func(struct nl_sock *sock, int argc, char ** argv)
> > > > > > > >      		}
> > > > > > > >      	}
> > > > > > > >      
> > > > > > > > -	read_nfsd_conf();
> > > > > > > > -
> > > > > > > >      	grace = conf_get_num("nfsd", "grace-time", 0);
> > > > > > > >      	ret = lockd_configure(sock, grace);
> > > > > > > >      	if (ret) {
> > > > > > > > @@ -1728,6 +1726,8 @@ int main(int argc, char **argv)
> > > > > > > >      	xlog_syslog(0);
> > > > > > > >      	xlog_stderr(1);
> > > > > > > >      
> > > > > > > > +	read_nfsd_conf();
> > > > > > > > +
> > > > > > > >      	/* Parse the preliminary options */
> > > > > > > >      	while ((opt = getopt_long(argc, argv, "+hdsV", pre_options, NULL)) != -1) {
> > > > > > > >      		switch (opt) {
> > > > > > > Ok... at this point we a prettier error message
> > > > > > > $ nfsdctl nlm
> > > > > > > nfsdctl: failed to resolve lockd generic netlink family
> > > > > > > 
> > > > > > > But the point of this argument is:
> > > > > > > 
> > > > > > > Get information about NLM (lockd) settings in the current net
> > > > > > > namespace. This subcommand takes no arguments.
> > > > > > > 
> > > > > > > How is that giving information from the running lockd?
> > > > > > > 
> > > > > > > What am I missing??
> > > > > > > 
> > > > > > 
> > > > > > You're missing a kernel that has the required netlink interface. To
> > > > > > test this properly, you'll need to patch your kernel, until that patch
> > > > > > makes it upstream.
> > > > > Okay... I figured it was something like that. But doesn't make sense to
> > > > > wait until the patch is in upstream so the argument can be properly
> > > > > tested? Why add an argument that will always fail?
> > > > > 
> > > > 
> > > > Why can't it be properly tested? It's just a matter of running a more
> > > > recent kernel that has the right interfaces. That should be in linux-
> > > > next soon (if not already).
> > > I'm doing my testing on a 6.13.0-0.rc6 which will soon be
> > > a 6.14 kernel... its my understanding the needed kernel
> > > patch will be in the 6.15 kernel... Please correct me
> > > if that is not true.
> > > 
> > > > 
> > > > I think the question is whether we want to wait until the kernel
> > > > interfaces trickle out into downstream distro kernels before we ship
> > > > any userland support in an upstream project (nfs-utils).
> > > Yes! As soon as the kernel support hits the upstream kernel,
> > > we will be good to go. I just don't want to put a feature
> > > in that will fail %100 of the time.
> > > 
> > > > 
> > > > If you want to wait until it hits Fedora Rawhide kernels, then you're
> > > > looking at about 10-12 weeks from now. If you want to wait until it
> > > > makes it into a stable Fedora release kernel then we're looking at
> > > > about 6 months from now.
> > > nfsdctl is in all current Fedora stable releases, which
> > > is the reason I'm pushing back. I do not want to put something
> > > in that will make it fail. That just does not make sense to me.
> > > 
> > > > 
> > > > I'll note that that it took 6 months to get the original nfsdctl
> > > > patches merged because of the lag on kernel patches making it into
> > > > distros, and I think that was way too long.
> > > It took that long because there were issues with the command.
> > > In which I was glad to help debug some of the issues...
> > > 
> > > New technology takes time to develop... I just think this
> > > is one of those cases.
> > > 
> > 
> > Ok, your call. To be clear though, that patch is part of my solution
> > for this bug.
> > 
> >      https://issues.redhat.com/browse/RHEL-71698
> > 
> > If you're going to delay it for several months, then can I trouble you
> > to come up with a fix for it that you find acceptable?
> How is this a fix when the subcommand will not work
> without the kernel patch?
> 

The nfs-server.service file defines this:

    ExecStart=/bin/sh -c '/usr/sbin/nfsdctl autostart || /usr/sbin/rpc.nfsd' 

When the lockd netlink interface is needed, but isn't available, then
startup will fall back to just calling rpc.nfsd. Currently, the
grace_period setting is just ignored, so that fallback just doesn't
happen. Very few people will need this; only those that set lockd's
ports, or that set the grace_period.

> I'm sure the subcommand works with the kernel patch
> but without it... what's the point?

-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux