Re: [PATCH] SUNRPC: backchannel RPC request must reference XPRT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 15 2019, Trond Myklebust wrote:

> On Wed, 2019-10-16 at 10:23 +1100, NeilBrown wrote:
>> On Tue, Oct 15 2019, Trond Myklebust wrote:
>> 
>> > Hi Neil,
>> > 
>> > On Tue, 2019-10-15 at 10:36 +1100, NeilBrown wrote:
>> > > The backchannel RPC requests - that are queued waiting
>> > > for the reply to be sent by the "NFSv4 callback" thread -
>> > > have a pointer to the xprt, but it is not reference counted.
>> > > It is possible for the xprt to be freed while there are
>> > > still queued requests.
>> > > 
>> > > I think this has been a problem since
>> > > Commit fb7a0b9addbd ("nfs41: New backchannel helper routines")
>> > > when the code was introduced, but I suspect it became more of
>> > > a problem after
>> > > Commit 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
>> > > transports")
>> > > (or there abouts).
>> > > Before this second patch, the nfs client would hold a reference
>> > > to
>> > > the xprt to keep it alive.  After multipath was introduced,
>> > > a client holds a reference to a swtich, and the switch can have
>> > > multiple
>> > > xprts which can be added and removed.
>> > > 
>> > > I'm not sure of all the causal issues, but this patch has
>> > > fixed a customer problem were an NFSv4.1 client would run out
>> > > of memory with tens of thousands of backchannel rpc requests
>> > > queued for an xprt that had been freed.  This was a 64K-page
>> > > machine so each rpc_rqst consumed more than 128K of memory.
>> > > 
>> > > Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
>> > > transports")
>> > > cc: stable@xxxxxxxxxxxxxxx (v4.6)
>> > > Signed-off-by: NeilBrown <neilb@xxxxxxx>
>> > > ---
>> > >  net/sunrpc/backchannel_rqst.c | 3 ++-
>> > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > > 
>> > > diff --git a/net/sunrpc/backchannel_rqst.c
>> > > b/net/sunrpc/backchannel_rqst.c
>> > > index 339e8c077c2d..c95ca39688b6 100644
>> > > --- a/net/sunrpc/backchannel_rqst.c
>> > > +++ b/net/sunrpc/backchannel_rqst.c
>> > > @@ -61,6 +61,7 @@ static void xprt_free_allocation(struct
>> > > rpc_rqst
>> > > *req)
>> > >  	free_page((unsigned long)xbufp->head[0].iov_base);
>> > >  	xbufp = &req->rq_snd_buf;
>> > >  	free_page((unsigned long)xbufp->head[0].iov_base);
>> > > +	xprt_put(req->rq_xprt);
>> > >  	kfree(req);
>> > >  }
>> > 
>> > Would it perhaps make better sense to move the xprt_get() to
>> > xprt_lookup_bc_request() and to release it in xprt_free_bc_rqst()? 
>> 
>> ... maybe ...
>> 
>> > Otherwise as far as I can tell, we will have freed slots on the
>> > xprt-
>> > > bc_pa_list that hold a reference to the transport itself, meaning
>> > > that
>> > the latter never gets released.
>> 
>> Apart from cleanup-on-error paths, xprt_free_allocation() is called:
>>  - in xprt_destroy_bc() - at most 'max_reqs' times.
>>  - in xprt_free_bc_rqst(), if the bc_alloc_count >= bc_alloc_max
>> 
>> So when xprt_destroy_bc() is called (from nfs4_destroy_session, via
>> xprt_destroy_backchannel()), bc_alloc_max() is reduced, and possibly
>> the requests are freed and bc_alloc_count is reduced accordingly.
>> If the requests were busy, they won't have been freed then, but will
>> then be freed when xprt_free_bc_reqst is called - because
>> bc_alloc_max
>> has been reduced.
>> 
>> Once nfs4_destroy_session() has been called on all session, and
>> xprt_free_bc_rqst() has been called on all active requests, I think
>> they should be no requests left to be holding a reference to
>> the xprt.
>> 
>> And if there were requests left, and if we change the refcount code
>> (like you suggest) so that they weren't holding a reference, then
>> they
>> would never be freed. - freeing an xprt doesn't clean out the
>> bc_pa_list.
>> 
>> Though ... the connection from a session to an xprt isn't permanent
>> (I
>> guess, based on the rcu_read_lock in nfs4_destroy_session... which
>> itself seems a bit odd as it doesn't inc a refcount while holding the
>> lock).
>> 
>> So maybe the counts can get out of balance.
>> 
>> Conclusion: I'm not sure the ref counts are entirely correct, but I
>>    cannot see a benefit from moving the xprt_get/put requests like
>> you
>>    suggest.
>> 
>
> I don't buy that conclusion.
>
> Nothing stops me from changing the value of NFS41_BC_MIN_CALLBACKS to
> zero. Why should that affect the existence or not of the transport by
> changing the number of references held? That parameter is supposed to
> determine the number of pre-allocated requests and nothing else.

If you reduce NFS41_BC_MIN_CALLBACKS to zero, bc_alloc_max won't be
increased and will remain at zero.  So xprt_get_bc_request() will
always need to allocate a new request (bc_pa_list will be empty),
and xprt_free_bc_rqst() will always free the request - never putting it
on the bc_pa_list.

Maybe it would make sense to do exactly that, get rid of bc_pa_list, and
possibly use a mempool if there is any concern about delays when
allocating.

I agree that it is a little odd for requests on the pa list to hold a
reference to the xprt, and I don't object at all to changing that.
My only point is that I think my patch as it stands is correct, that it
doesn't introduce new problems (and so might be the simplest thing for
-stable).

Thanks,
NeilBrown


>
> I do agree with your assessment that destroying the xprt does not
> currrently destroy the contents of xprt->bc_pa_list if they are non-
> zero, but that would be a bug, and an easy one to fix.
>
>
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux