On 2023-11-30 09:30:52, Benjamin Coddington wrote: > > Actually my concern is the NFSACL prog. With `cl_softrtrt == 1` and > > `to_initval == to_maxval`, does it mean retires will not happen > > regardless of `to_retries` and `to_increment`? > > Possibly? I'm not exactly certain of what should happen in that case. > > > I encountered a situation where the NFSACL program did not retry but > > could have had, whereas NFS3 did successfully. Not sure regarding NSM, > > but it seems to me that it would make sense at least for NFSACL to > > behave the same as NFS3. > > I agree, but I could be missing something -- maybe its a bug. There's the > sunrpc:rpc_timeout_status tracepoint that might be helpful. If you turn > that up can you see rpc_check_timeout() getting called from > call_transmit_status()? Sorry, took awhile to get a test working while busy on other stuff. So it looks really like a bug, here are the details. Server: nfsd with extra fault injection code that calls `svc_drop()` only once on a single NFS GETACL request. Client: Linux v6.8, NFS mount with `soft,timeo=50,retrans=16,vers=3`. I trace client execution with the following: sudo perf trace -e sunrpc:rpc_task_timeout -e sunrpc:xprt_retransmit A simple `ls -l` gets stuck and shows an IO failure: [root@client export]# ls -l ls: file: Input/output error total 0 -rw-r--r-- 1 root root 0 Apr 10 10:02 file I get a single event out of the tracing above: ``` kthreadd/7926 sunrpc:rpc_task_timeout(task_id: 203, client_id: 6, xprt_id: 3, action: 0xffffffffc0accc60, runstate: 22, flags: 35456) ``` So looks like the request is not being retransmitted. Just to be sure, if I cause the nfsd to drop the regular NFS3 prog I/Os like ACCESS and LOOKUP, I only get the expected 5 seconds delay following a successful retry. Seems we only have an issue with the NFS3ACL prog. -- Dan Aloni