From: Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > On Jul 11, 2022, at 1:33 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > > > > > > >> On Jul 11, 2022, at 11:01 AM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > >> > >> > >> > >>> On Jul 10, 2022, at 6:10 PM, Rick Macklem <rmacklem@xxxxxxxxxxx> wrote: > >>> > >>> Hi, > >>> > >>> I have been trying to improve the behaviour of the FreeBSD > >>> NFSv4.1/4.2 client when using the "intr" mount option. > >>> > >>> I have come up with the following scheme: > >>> - When RPCs are interrupted, mark the session slot as potentially bad. > >>> - When all session slots are marked potentially bad, do a > >>> DestroySession (only op in RPC) to destroy the session. > >>> - When the server replies NFS4ERR_BAD_SESSION, > >>> do a CreateSession (only op in RPC) to acquire a new session and > >>> continue on. > >>> > >>> When testing against a Linux 5.15 server, the CreateSession > >>> succeeds, but returns the same sessionid as the old session. > >>> Then all subsequent RPCs get the NFS4ERR_BAD_SESSION reply. > >>> (The client repeatedly does CreateSession RPCs that reply NFS_OK, > >>> but always with the same sessionid as the destroyed one.) > >>> > >>> Here's what I see in the packet trace: > >>> (everything works normally until all session slots are marked > >>> potentially bad at packet# 14216) > >>> packet# RPC > >>> 14216 DestroySession request for sessionid 2725cb62002ed418040...0 > >>> 14302 DestroySession reply NFS_OK > >>> 14304 Getattr request (using above sessionid) > >>> 14305 Getattr reply NFS4ERR_BAD_SESSION > >>> 14306 CreateSession request > >>> *** Now here is where I see a problem... > >>> 14307 CreateSession reply NFS_OK with sessionid > >>> 2725cb62002ed418040...0 (same as above) > >>> 14308 Getattr request (using above sessionid) > >>> 14309 Getattr reply NFS4ERR_BAD_SESSION > >>> - and then this just repeats... > >>> The whole packet trace can be found here, in case you are interested: > >>> https://people.freebsd.org/~rmacklem/linux.pcap > >>> > >>> It seems to me that a successful CreateSession should always return > >>> a new unique sessionid? > >> > >> Hi Rick, thanks for the bug report. > >> > >> CREATE_SESSION has a built-in reply cache to thwart replay attacks. > >> It can legitimately return the same sessionid as a previous request. > >> Granted, DESTROY_SESSION is supposed to wipe that reply cache... Well, I just re-read the RFC and I don't see anything that says DestroySession affects the CreateSession reply cache. I had thought it did, but I was wrong. See below... > >> > >> I'd like to see if there's a test in pynfs that replicates or is close > >> to the series of operations in your trace so that I can reproduce on > >> my lab systems and watch it fail up close. > > > > I constructed a pynfs test that does something similar to your > > reproducer: > > > > diff --git a/nfs4.1/server41tests/st_destroy_session.py b/nfs4.1/server41tests/st_destroy_session.py > > index b8be62582366..014330e7d623 100644 > > --- a/nfs4.1/server41tests/st_destroy_session.py > > +++ b/nfs4.1/server41tests/st_destroy_session.py > > @@ -1,12 +1,33 @@ > > from .st_create_session import create_session > > from xdrdef.nfs4_const import * > > -from .environment import check, fail, create_file, open_file > > +from .environment import check, fail, create_file, open_file, close_file > > from xdrdef.nfs4_type import open_owner4, openflag4, createhow4, open_claim4 > > import nfs_ops > > op = nfs_ops.NFS4ops() > > import threading > > import rpc.rpc as rpc > > > > +def testDestroyBasic(t, env): > > + """Ensure operations outside a session fail with BADSESSION > > + > > + FLAGS: destroy_session all > > + CODE: DSESS1 > > + """ > > + c = env.c1.new_client(env.testname(t)) > > + sess1 = c.create_session() > > + sess1.compound([op.reclaim_complete(FALSE)]) > > + res = c.c.compound([op.destroy_session(sess1.sessionid)]) > > + res = create_file(sess1, env.testname(t), > > + access=OPEN4_SHARE_ACCESS_READ) > > + check(res, NFS4ERR_BADSESSION) > > + sess2 = c.create_session() > > + res = create_file(sess2, env.testname(t), > > + access=OPEN4_SHARE_ACCESS_READ) > > + check(res) > > + fh = res.resarray[-1].object > > + open_stateid = res.resarray[-2].stateid > > + close_file(sess2, fh, stateid=open_stateid) > > + > > def testDestroy(t, env): > > """ > > - create a session > > > > I'm not able to reproduce the problem on 5.19-rc5, but that > > probably means there's something going on that we haven't > > discovered yet. Nope, your guess below was correct. It is a FreeBSD client bug. > My guess is that your client is sending CREATE_SESSION operations > with the same sequence ID (1) and that is hitting in NFSD's > CREATE_SESSION reply cache. So it's treating the client's new > requests as replays and returning an old (stale) sessionid. Yep. For some reason, I thought that a DestroySeseesion put the sequence# back. On re-reading the RFC, there is nothing mentioning that. Now that I've patched the client to increment the sequence#, it works fine against the Linux server. Sorry about the noise (hopefully your pynfs test is still useful). Thanks for the help, rick ps: Now I have to fix a deficiency in the FreeBSD server. Right now, it only checks the sequence# for an unconfirmed ClientID, but that is obviously a FreeBSD bug, too. -- Chuck Lever