Re: [PATCH v1 26/27] SUNRPC: Set rq_accept_statp inside ->accept methods

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 16 May 2023 15:23:41 -0400

On Tue, 2023-05-02 at 14:14 +0000, Chuck Lever III wrote:
> 
> > On May 2, 2023, at 7:01 AM, Jiri Slaby <jirislaby@xxxxxxxxxx> wrote:
> > 
> > On 08. 01. 23, 17:31, Chuck Lever wrote:
> > > From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > > To navigate around the space that svcauth_gss_accept() reserves
> > > for the RPC payload body length and sequence number fields,
> > > svcauth_gss_release() does a little dance with the reply's
> > > accept_stat, moving the accept_stat value in the response buffer
> > > down by two words.
> > > Instead, let's have the ->accept() methods each set the proper
> > > final location of the accept_stat to avoid having to move
> > > things.
> > 
> > Hi,
> > 
> > I bisected to this (4bcf0343e8)
> 
> Assuming you did the bisect on the NFS server's kernel?
> 
> 
> > as it breaks nfs3-only servers in 6.3. I.e. /etc/nfs.conf containing:
> > [nfsd]
> > vers4=no
> 
> Note: Changing the settings in /etc/nfs.conf had no effect
> on my server, so I effected the change by stopping the
> server and poking values into /proc/fs/nfsd/versions by
> hand.
> 
> Steve?
> 
> 
> > The client sees:
> >  mount("10.0.2.15:/tmp", "/mnt", "nfs", 0, "vers=4.2,addr=10.0.2.15,clientad"...) = -1 EIO (Input/output error)
> >  write(2, "mount.nfs: mount system call fai"..., 45
> >  mount.nfs: mount system call failed for /mnt
> > 
> > And the kernel says:
> >  nfs4_discover_server_trunking unhandled error -5. Exiting with error EIO
> > 
> > I reported in downstream as:
> > https://bugzilla.suse.com/show_bug.cgi?id=1210995
> > 
> > It cannot be reverted cleanly on the top of 6.3.
> > 
> > Any ideas?
> 
> I can reproduce a similar problem. Network capture shows
> that the server is responding with NFS4ERR_NOENT to the
> EXCHANGE_ID operation, and the client kernel log says:
> 
> >  nfs4_discover_server_trunking unhandled error -121. Exiting with error EIO
> 
> That's not the failure mode I expected given the commit
> you bisected to, so it might not be the same problem you've
> hit. I'll troubleshoot this and send a fix for testing.
> 

Alex hit this problem in testing too, and I took a quick look.

In the attached capture, the client should have gotten back a
RPC_PROG_MISMATCH error, but the server has recorded an extra successful
accept state before encoding the RPC_PROG_MISMATCH error, leading to a
malformed reply.

I think that the problem is that encoding the accept status too early
means that we can't properly handle failures from the pg_init_request
call.

Chuck, any thoughts on how you'd like to handle this?
-- 
Jeff Layton <jlayton@xxxxxxxxxx>
Attachment:
bad-fallback.pcapng.gz

Description: application/pcapng