RE: [PATCH v1] xprtrdma: Fix calculation of ri_max_send_sges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx]
> Sent: Wednesday, January 31, 2018 5:04 PM
> To: Kalderon, Michal <Michal.Kalderon@xxxxxxxxxx>
> Cc: linux-rdma@xxxxxxxxxxxxxxx; Linux NFS Mailing List <linux-
> nfs@xxxxxxxxxxxxxxx>
> Subject: Re: [PATCH v1] xprtrdma: Fix calculation of ri_max_send_sges
> 
> 
> 
> > On Jan 31, 2018, at 9:04 AM, Kalderon, Michal
> <Michal.Kalderon@xxxxxxxxxx> wrote:
> >
> >> From: Chuck Lever [mailto:chucklever@xxxxxxxxx] On Behalf Of Chuck
> >> Lever
> >> Sent: Tuesday, January 30, 2018 10:54 PM
> >> To: Kalderon, Michal <Michal.Kalderon@xxxxxxxxxx>
> >> Cc: linux-rdma@xxxxxxxxxxxxxxx; linux-nfs@xxxxxxxxxxxxxxx
> >> Subject: [PATCH v1] xprtrdma: Fix calculation of ri_max_send_sges
> >>
> >> Commit 16f906d66cd7 ("xprtrdma: Reduce required number of send
> >> SGEs") introduced the rpcrdma_ia::ri_max_send_sges field. This fixes
> >> a problem where xprtrdma would not work if the device's max_sge
> >> capability was small (low single digits).
> >>
> >> At least RPCRDMA_MIN_SEND_SGES are needed for the inline parts of
> >> each RPC. ri_max_send_sges is set to this value:
> >>
> >>  ia->ri_max_send_sges = max_sge - RPCRDMA_MIN_SEND_SGES;
> >>
> >> Then when marshaling each RPC, rpcrdma_args_inline uses that value to
> >> determine whether the device has enough Send SGEs to convey an NFS
> >> WRITE payload inline, or whether instead a Read chunk is required.
> >>
> >> More recently, commit ae72950abf99 ("xprtrdma: Add data structure to
> >> manage RDMA Send arguments") used the ri_max_send_sges value to
> >> calculate the size of an array, but that commit erroneously assumed
> >> ri_max_send_sges contains a value similar to the device's max_sge,
> >> and not one that was reduced by the minimum SGE count.
> >>
> >> This assumption results in the calculated size of the sendctx's Send
> >> SGE array to be too small. When the array is used to marshal an RPC,
> >> the code can write Send SGEs into the following sendctx element in that
> array, corrupting it.
> >> When the device's max_sge is large, this issue is entirely harmless;
> >> but it results in an oops in the provider's post_send method, if
> >> dev.attrs.max_sge is small.
> >>
> >> So let's straighten this out: ri_max_send_sges will now contain a
> >> value with the same meaning as dev.attrs.max_sge, which makes the
> >> code easier to understand, and enables rpcrdma_sendctx_create to
> >> calculate the size of the SGE array correctly.
> >>
> >> Reported-by: Michal Kalderon <Michal.Kalderon@xxxxxxxxxx>
> >> Fixes: 16f906d66cd7 ("xprtrdma: Reduce required number of send SGEs")
> >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >> ---
> >> net/sunrpc/xprtrdma/rpc_rdma.c |    2 +-
> >> net/sunrpc/xprtrdma/verbs.c    |    2 +-
> >> 2 files changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
> >> b/net/sunrpc/xprtrdma/rpc_rdma.c index 162e5dd..f0855a9 100644
> >> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> >> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> >> @@ -143,7 +143,7 @@ static bool rpcrdma_args_inline(struct
> >> rpcrdma_xprt *r_xprt,
> >> 	if (xdr->page_len) {
> >> 		remaining = xdr->page_len;
> >> 		offset = offset_in_page(xdr->page_base);
> >> -		count = 0;
> >> +		count = RPCRDMA_MIN_SEND_SGES;
> >> 		while (remaining) {
> >> 			remaining -= min_t(unsigned int,
> >> 					   PAGE_SIZE - offset, remaining); diff
> --git
> >> a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index
> >> f4eb63e..bb56b9d 100644
> >> --- a/net/sunrpc/xprtrdma/verbs.c
> >> +++ b/net/sunrpc/xprtrdma/verbs.c
> >> @@ -505,7 +505,7 @@
> >> 		pr_warn("rpcrdma: HCA provides only %d send SGEs\n",
> max_sge);
> >> 		return -ENOMEM;
> >> 	}
> >> -	ia->ri_max_send_sges = max_sge - RPCRDMA_MIN_SEND_SGES;
> >> +	ia->ri_max_send_sges = max_sge;
> >>
> >> 	if (ia->ri_device->attrs.max_qp_wr <= RPCRDMA_BACKWARD_WRS)
> {
> >> 		dprintk("RPC:       %s: insufficient wqe's available\n",
> >
> > thanks, this fixed our first issue of mount failing.
> 
> May I add "Tested-by: Michal Kalderon <Michal.Kalderon@xxxxxxxxxx>" ?
> 
> --
> Chuck Lever
> 
> 
sure
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux