On Fri, Jun 30, 2017 at 04:33:23PM -0400, Chuck Lever wrote: > > > On Jun 30, 2017, at 4:16 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > > > I like that better. > > > > I'm still pretty confused. > > > > The long comment above RPCSVC_MAXPAGES convinces me that it's > > calculating the maximum number of pages a TCP request might need. But > > apparently rq_pages has never been large enough to hold that many pages. > > Has nobody ever tried an unaligned 1MB read? > > I'm not sure how the TCP transport works. If it tries to fill the > head iovec completely, then goes into the page list, then it might > not ever need the extra page on the end for the GETATTR. I think an unaligned read of N pages ends up with rq_pages like: 1st page: read request 2nd page: head+tail of reply 3rd page: partial first page of read data N-1 pages: read data last page: partial last page of read data So, N+3 pages. > For NFS/RDMA, the transport currently aligns the incoming data so > it always starts at byte 0 of the first page in the xdr_buf's page > list. That's going to have to change if we want direct placement > of the write payload, and at that point an unaligned payload will > be possible. > > > > We could use comments each time elsewhere when we add another random 1 > > or 2 somewhere. What does sv_max_mesg really mean, why are we adding 2 > > to it in svc_alloc_arg, etc. > > I'm open to suggestions. A comment in svc_alloc_arg might just > refer weary readers to the comment in front of the definition of > RPCSVC_MAXPAGES. There's 3 extra pages in rq_pages beyond what's needed to hold a max IO size buffer, and I'd like the comment to explain *which* 2 of those three pages we're accounting for here. I could figure that out if I knew what xv_max_mesg meant, but the comment there just says 1 page is added for "overhead", which isn't helpful. > Let me know if there's anything else you want to see changed, or > if this patch is now acceptable. > > > > Might make more sense if RPCSVC_MAXPAYLOAD was explicitly calculated as > > a max over the compiled-in transports, too. > > IMO sv_max_mesg could be expressed in pages, instead of bytes, so > that the computation in svc_alloc_arg isn't repeated for every RPC. Makes sense to me. > > > > I feel like we've actually been through this before, but somehow I'm > > still confused every time I look at it. > > > > --b. > > > > On Fri, Jun 30, 2017 at 12:03:54PM -0400, Chuck Lever wrote: > >> svcrdma needs 259 pages for 1MB NFSv4.0 READ requests: > > This should be > > "svcrdma needs 259 pages for 1MB NFSv4.0 WRITE requests:" Oh, got it. --b. > > > >> - 1 page for the transport header and head iovec > >> - 256 pages for the data payload > >> - 1 page for the trailing GETATTR request (since NFSD XDR decoding > >> does not look for a tail iovec, the GETATTR is stuck at the end > >> of the rqstp->rq_arg.pages list) > >> - 1 page for building the reply xdr_buf > >> > >> But RPCSVC_MAXPAGES is already 259 (on x86_64). The problem is that > >> svc_alloc_arg never allocates that many pages. To address this: > >> > >> 1. The final element of rq_pages always points to NULL. To > >> accommodate up to 259 pages in rq_pages, add an extra element > >> to rq_pages for the array termination sentinel. > >> > >> 2. Adjust the calculation of "pages" to match how RPCSVC_MAXPAGES > >> is calculated, so it can go up to 259. Bruce noted that the > >> calculation assumes sv_max_mesg is a multiple of PAGE_SIZE, > >> which might not always be true. I didn't change this assumption. > >> > >> 3. Change the loop boundaries to allow 259 pages to be allocated. > >> > >> Additional clean-up: WARN_ON_ONCE adds an extra conditional branch, > >> which is basically never taken. And there's no need to dump the > >> stack here because svc_alloc_arg has only one caller. > >> > >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > >> --- > >> Hi Bruce- > >> > >> This is what I proposed yesterday evening. Instead of changing > >> the value of RPCSVC_MAXPAGES, ensure that svc_alloc_arg can > >> safely allocate that many pages. > >> > >> > >> include/linux/sunrpc/svc.h | 2 +- > >> net/sunrpc/svc_xprt.c | 10 ++++++---- > >> 2 files changed, 7 insertions(+), 5 deletions(-) > >> > >> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h > >> index 11cef5a..d741399 100644 > >> --- a/include/linux/sunrpc/svc.h > >> +++ b/include/linux/sunrpc/svc.h > >> @@ -246,7 +246,7 @@ struct svc_rqst { > >> size_t rq_xprt_hlen; /* xprt header len */ > >> struct xdr_buf rq_arg; > >> struct xdr_buf rq_res; > >> - struct page * rq_pages[RPCSVC_MAXPAGES]; > >> + struct page *rq_pages[RPCSVC_MAXPAGES + 1]; > >> struct page * *rq_respages; /* points into rq_pages */ > >> struct page * *rq_next_page; /* next reply page to use */ > >> struct page * *rq_page_end; /* one past the last page */ > >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > >> index 7bfe1fb..d16a8b4 100644 > >> --- a/net/sunrpc/svc_xprt.c > >> +++ b/net/sunrpc/svc_xprt.c > >> @@ -659,11 +659,13 @@ static int svc_alloc_arg(struct svc_rqst *rqstp) > >> int i; > >> > >> /* now allocate needed pages. If we get a failure, sleep briefly */ > >> - pages = (serv->sv_max_mesg + PAGE_SIZE) / PAGE_SIZE; > >> - WARN_ON_ONCE(pages >= RPCSVC_MAXPAGES); > >> - if (pages >= RPCSVC_MAXPAGES) > >> + pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT; > >> + if (pages > RPCSVC_MAXPAGES) { > >> + pr_warn_once("svc: warning: pages=%u > RPCSVC_MAXPAGES=%lu\n", > >> + pages, RPCSVC_MAXPAGES); > >> /* use as many pages as possible */ > >> - pages = RPCSVC_MAXPAGES - 1; > >> + pages = RPCSVC_MAXPAGES; > >> + } > >> for (i = 0; i < pages ; i++) > >> while (rqstp->rq_pages[i] == NULL) { > >> struct page *p = alloc_page(GFP_KERNEL); > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html