Re: mount options not propagating to NFSACL and NSM RPC clients

Dan Aloni <dan.aloni@xxxxxxxxxxxx> · Wed, 10 Apr 2024 17:39:44 +0300

On 2023-11-30 09:30:52, Benjamin Coddington wrote:
> > Actually my concern is the NFSACL prog. With `cl_softrtrt == 1` and
> > `to_initval == to_maxval`, does it mean retires will not happen
> > regardless of `to_retries` and `to_increment`?
> 
> Possibly?  I'm not exactly certain of what should happen in that case.
> 
> > I encountered a situation where the NFSACL program did not retry but
> > could have had, whereas NFS3 did successfully. Not sure regarding NSM,
> > but it seems to me that it would make sense at least for NFSACL to
> > behave the same as NFS3.
> 
> I agree, but I could be missing something -- maybe its a bug.  There's the
> sunrpc:rpc_timeout_status tracepoint that might be helpful.  If you turn
> that up can you see rpc_check_timeout() getting called from
> call_transmit_status()?

Sorry, took awhile to get a test working while busy on other stuff.

So it looks really like a bug, here are the details.

Server: nfsd with extra fault injection code that calls `svc_drop()` only once
on a single NFS GETACL request.
Client: Linux v6.8, NFS mount with `soft,timeo=50,retrans=16,vers=3`.

I trace client execution with the following:

    sudo perf trace -e sunrpc:rpc_task_timeout -e sunrpc:xprt_retransmit

A simple `ls -l` gets stuck and shows an IO failure:

    [root@client export]# ls -l
    ls: file: Input/output error
    total 0
    -rw-r--r-- 1 root root 0 Apr 10 10:02 file

I get a single event out of the tracing above:

```
kthreadd/7926 sunrpc:rpc_task_timeout(task_id: 203, client_id: 6, xprt_id: 3, action: 0xffffffffc0accc60, runstate: 22, flags: 35456)
```

So looks like the request is not being retransmitted. Just to be sure,
if I cause the nfsd to drop the regular NFS3 prog I/Os like ACCESS and
LOOKUP, I only get the expected 5 seconds delay following a successful
retry.

Seems we only have an issue with the NFS3ACL prog.

-- 
Dan Aloni