Re: [PATCH v1] sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 30, 2017 at 04:33:23PM -0400, Chuck Lever wrote:
> 
> > On Jun 30, 2017, at 4:16 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> > 
> > I like that better.
> > 
> > I'm still pretty confused.
> > 
> > The long comment above RPCSVC_MAXPAGES convinces me that it's
> > calculating the maximum number of pages a TCP request might need.  But
> > apparently rq_pages has never been large enough to hold that many pages.
> > Has nobody ever tried an unaligned 1MB read?
> 
> I'm not sure how the TCP transport works. If it tries to fill the
> head iovec completely, then goes into the page list, then it might
> not ever need the extra page on the end for the GETATTR.

I think an unaligned read of N pages ends up with rq_pages like:

	1st page: read request
	2nd page: head+tail of reply
	3rd page: partial first page of read data
	N-1 pages: read data
	last page: partial last page of read data

So, N+3 pages.

> For NFS/RDMA, the transport currently aligns the incoming data so
> it always starts at byte 0 of the first page in the xdr_buf's page
> list. That's going to have to change if we want direct placement
> of the write payload, and at that point an unaligned payload will
> be possible.
> 
> 
> > We could use comments each time elsewhere when we add another random 1
> > or 2 somewhere.  What does sv_max_mesg really mean, why are we adding 2
> > to it in svc_alloc_arg, etc.
> 
> I'm open to suggestions. A comment in svc_alloc_arg might just
> refer weary readers to the comment in front of the definition of
> RPCSVC_MAXPAGES.

There's 3 extra pages in rq_pages beyond what's needed to hold a max IO
size buffer, and I'd like the comment to explain *which* 2 of those
three pages we're accounting for here.  I could figure that out if I
knew what xv_max_mesg meant, but the comment there just says 1 page is
added for "overhead", which isn't helpful.

> Let me know if there's anything else you want to see changed, or
> if this patch is now acceptable.
> 
> 
> > Might make more sense if RPCSVC_MAXPAYLOAD was explicitly calculated as
> > a max over the compiled-in transports, too.
> 
> IMO sv_max_mesg could be expressed in pages, instead of bytes, so
> that the computation in svc_alloc_arg isn't repeated for every RPC.

Makes sense to me.

> 
> 
> > I feel like we've actually been through this before, but somehow I'm
> > still confused every time I look at it.
> > 
> > --b.
> > 
> > On Fri, Jun 30, 2017 at 12:03:54PM -0400, Chuck Lever wrote:
> >> svcrdma needs 259 pages for 1MB NFSv4.0 READ requests:
> 
> This should be
> 
> "svcrdma needs 259 pages for 1MB NFSv4.0 WRITE requests:"

Oh, got it.

--b.

> 
> 
> >> - 1 page for the transport header and head iovec
> >> - 256 pages for the data payload
> >> - 1 page for the trailing GETATTR request (since NFSD XDR decoding
> >>   does not look for a tail iovec, the GETATTR is stuck at the end
> >>   of the rqstp->rq_arg.pages list)
> >> - 1 page for building the reply xdr_buf
> >> 
> >> But RPCSVC_MAXPAGES is already 259 (on x86_64). The problem is that
> >> svc_alloc_arg never allocates that many pages. To address this:
> >> 
> >> 1. The final element of rq_pages always points to NULL. To
> >>   accommodate up to 259 pages in rq_pages, add an extra element
> >>   to rq_pages for the array termination sentinel.
> >> 
> >> 2. Adjust the calculation of "pages" to match how RPCSVC_MAXPAGES
> >>   is calculated, so it can go up to 259. Bruce noted that the
> >>   calculation assumes sv_max_mesg is a multiple of PAGE_SIZE,
> >>   which might not always be true. I didn't change this assumption.
> >> 
> >> 3. Change the loop boundaries to allow 259 pages to be allocated.
> >> 
> >> Additional clean-up: WARN_ON_ONCE adds an extra conditional branch,
> >> which is basically never taken. And there's no need to dump the
> >> stack here because svc_alloc_arg has only one caller.
> >> 
> >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >> ---
> >> Hi Bruce-
> >> 
> >> This is what I proposed yesterday evening. Instead of changing
> >> the value of RPCSVC_MAXPAGES, ensure that svc_alloc_arg can
> >> safely allocate that many pages.
> >> 
> >> 
> >> include/linux/sunrpc/svc.h |    2 +-
> >> net/sunrpc/svc_xprt.c      |   10 ++++++----
> >> 2 files changed, 7 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> >> index 11cef5a..d741399 100644
> >> --- a/include/linux/sunrpc/svc.h
> >> +++ b/include/linux/sunrpc/svc.h
> >> @@ -246,7 +246,7 @@ struct svc_rqst {
> >> 	size_t			rq_xprt_hlen;	/* xprt header len */
> >> 	struct xdr_buf		rq_arg;
> >> 	struct xdr_buf		rq_res;
> >> -	struct page *		rq_pages[RPCSVC_MAXPAGES];
> >> +	struct page		*rq_pages[RPCSVC_MAXPAGES + 1];
> >> 	struct page *		*rq_respages;	/* points into rq_pages */
> >> 	struct page *		*rq_next_page; /* next reply page to use */
> >> 	struct page *		*rq_page_end;  /* one past the last page */
> >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >> index 7bfe1fb..d16a8b4 100644
> >> --- a/net/sunrpc/svc_xprt.c
> >> +++ b/net/sunrpc/svc_xprt.c
> >> @@ -659,11 +659,13 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> >> 	int i;
> >> 
> >> 	/* now allocate needed pages.  If we get a failure, sleep briefly */
> >> -	pages = (serv->sv_max_mesg + PAGE_SIZE) / PAGE_SIZE;
> >> -	WARN_ON_ONCE(pages >= RPCSVC_MAXPAGES);
> >> -	if (pages >= RPCSVC_MAXPAGES)
> >> +	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> >> +	if (pages > RPCSVC_MAXPAGES) {
> >> +		pr_warn_once("svc: warning: pages=%u > RPCSVC_MAXPAGES=%lu\n",
> >> +			     pages, RPCSVC_MAXPAGES);
> >> 		/* use as many pages as possible */
> >> -		pages = RPCSVC_MAXPAGES - 1;
> >> +		pages = RPCSVC_MAXPAGES;
> >> +	}
> >> 	for (i = 0; i < pages ; i++)
> >> 		while (rqstp->rq_pages[i] == NULL) {
> >> 			struct page *p = alloc_page(GFP_KERNEL);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux