On Tue, 2023-05-02 at 14:14 +0000, Chuck Lever III wrote: > > > On May 2, 2023, at 7:01 AM, Jiri Slaby <jirislaby@xxxxxxxxxx> wrote: > > > > On 08. 01. 23, 17:31, Chuck Lever wrote: > > > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > To navigate around the space that svcauth_gss_accept() reserves > > > for the RPC payload body length and sequence number fields, > > > svcauth_gss_release() does a little dance with the reply's > > > accept_stat, moving the accept_stat value in the response buffer > > > down by two words. > > > Instead, let's have the ->accept() methods each set the proper > > > final location of the accept_stat to avoid having to move > > > things. > > > > Hi, > > > > I bisected to this (4bcf0343e8) > > Assuming you did the bisect on the NFS server's kernel? > > > > as it breaks nfs3-only servers in 6.3. I.e. /etc/nfs.conf containing: > > [nfsd] > > vers4=no > > Note: Changing the settings in /etc/nfs.conf had no effect > on my server, so I effected the change by stopping the > server and poking values into /proc/fs/nfsd/versions by > hand. > > Steve? > > > > The client sees: > > mount("10.0.2.15:/tmp", "/mnt", "nfs", 0, "vers=4.2,addr=10.0.2.15,clientad"...) = -1 EIO (Input/output error) > > write(2, "mount.nfs: mount system call fai"..., 45 > > mount.nfs: mount system call failed for /mnt > > > > And the kernel says: > > nfs4_discover_server_trunking unhandled error -5. Exiting with error EIO > > > > I reported in downstream as: > > https://bugzilla.suse.com/show_bug.cgi?id=1210995 > > > > It cannot be reverted cleanly on the top of 6.3. > > > > Any ideas? > > I can reproduce a similar problem. Network capture shows > that the server is responding with NFS4ERR_NOENT to the > EXCHANGE_ID operation, and the client kernel log says: > > > nfs4_discover_server_trunking unhandled error -121. Exiting with error EIO > > That's not the failure mode I expected given the commit > you bisected to, so it might not be the same problem you've > hit. I'll troubleshoot this and send a fix for testing. > Alex hit this problem in testing too, and I took a quick look. In the attached capture, the client should have gotten back a RPC_PROG_MISMATCH error, but the server has recorded an extra successful accept state before encoding the RPC_PROG_MISMATCH error, leading to a malformed reply. I think that the problem is that encoding the accept status too early means that we can't properly handle failures from the pg_init_request call. Chuck, any thoughts on how you'd like to handle this? -- Jeff Layton <jlayton@xxxxxxxxxx>
Attachment:
bad-fallback.pcapng.gz
Description: application/pcapng