On 4 Jan 2024, at 7:13, Benjamin Coddington wrote: > On 4 Jan 2024, at 6:55, Benjamin Coddington wrote: > >> On 3 Jan 2024, at 18:00, Trond Myklebust wrote: >> >>> On Wed, 2024-01-03 at 16:45 -0500, Anna Schumaker wrote: >>>> Hi Ben, >>>> >>>> On Wed, Dec 13, 2023 at 2:49 PM Benjamin Coddington >>>> <bcodding@xxxxxxxxxx> wrote: >>>>> >>>>> For backchannel requests that lookup the appropriate nfs_client, >>>>> use the >>>>> state-management rpc_clnt's rpc_timeout parameters for the >>>>> backchannel's >>>>> response. When the nfs_client cannot be found, fall back to using >>>>> the >>>>> xprt's default timeout parameters. >>>> >>>> I'm seeing a use-after-free after applying this patch when using pNFS >>>> and session trunking. Any idea what's going on? Here is the stack >>>> trace I'm seeing: >>> >>> I'm going to guess that this is happening because nothing is clearing >>> rqstp->bc_rpc_clnt after a call to svc_process_bc(). So if something >>> later calls CB_NULL, then the resulting svc_process_bc() will free an >>> extra reference. >> >> Doh! >> >>>> >>>> I hit this while testing against ontap, if that helps. >>>> >>>> Thanks, >>>> Anna >> >> Thank you for the test!! >> >>>>> --- a/fs/nfs/callback_xdr.c >>>>> +++ b/fs/nfs/callback_xdr.c >>>>> @@ -967,6 +967,9 @@ static __be32 nfs4_callback_compound(struct >>>>> svc_rqst *rqstp) >>>>> nops--; >>>>> } >>>>> >>>>> + if (svc_is_backchannel(rqstp) && cps.clp) >>>>> + rqstp->bc_rpc_clnt = cps.clp->cl_rpcclient; >>> >>> >>> You can re-create the clnt->cl_timeout in svc_process_bc() by just >>> storing the values of to_initval and to_retrans here. Why store a >>> reference to an entire rpc client structure that you don't need? >> >> Hmm, I think I started by thinking we could simply set tk_client, but then >> didn't end up with that for other reasons and just kept passing the single >> pointer. > > .. and the client being NULL was the signal to fallback to using the xprt > timeouts. That's lost with to_initval and to_retries because they can be > set to zero deliberately. I'm wrong about this ^^. The mount option timeo must be > 0, so we can check if to_initval is zero to see if we should fall back to the xprt default. Ben