On Tue, Nov 24, 2020 at 03:18:55PM -0500, Chuck Lever wrote: > > On Nov 24, 2020, at 3:06 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote: > > > > On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote: > >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > >> > >> > >> > >> By switching to an XFS-backed export, I am able to reproduce the > >> ibcomp worker crash on my client with xfstests generic/013. > >> > >> For the failing LISTXATTRS operation, xdr_inline_pages() is called > >> with page_len=12 and buflen=128. Then: > >> > >> - Because buflen is small, rpcrdma_marshal_req will not set up a > >> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not > >> get invoked at all. > >> > >> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to > >> copy received data into rq_rcv_buf->pages, but they're missing. > >> > >> The result is that the ibcomp worker faults and dies. Sometimes that > >> causes a visible crash, and sometimes it results in a transport > >> hang without other symptoms. > >> > >> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and > >> should eventually be fixed or replaced. However, my preference is > >> that upper-layer operations should explicitly allocate their receive > >> buffers (using GFP_KERNEL) when possible, rather than relying on > >> XDRBUF_SPARSE_PAGES. > >> > >> Reported-by: Olga kornievskaia <kolga@xxxxxxxxxx> > >> Suggested-by: Olga kornievskaia <kolga@xxxxxxxxxx> > >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > >> --- > >> fs/nfs/nfs42proc.c | 17 ++++++++++------- > >> fs/nfs/nfs42xdr.c | 1 - > >> 2 files changed, 10 insertions(+), 8 deletions(-) > >> > >> Hi- > >> > >> I like Olga's proposed approach. What do you think of this patch? > >> > >> > >> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c > >> index 2b2211d1234e..24810305ec1c 100644 > >> --- a/fs/nfs/nfs42proc.c > >> +++ b/fs/nfs/nfs42proc.c > >> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, > >> .rpc_resp = &res, > >> }; > >> u32 xdrlen; > >> - int ret, np; > >> + int ret, np, i; > >> > >> > >> res.scratch = alloc_page(GFP_KERNEL); > >> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, > >> xdrlen = server->lxasize; > >> np = xdrlen / PAGE_SIZE + 1; > >> > >> + ret = -ENOMEM; > >> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL); > >> - if (pages == NULL) { > >> - __free_page(res.scratch); > >> - return -ENOMEM; > >> + if (pages == NULL) > >> + goto out_free; > >> + for (i = 0; i < np; i++) { > >> + pages[i] = alloc_page(GFP_KERNEL); > >> + if (!pages[i]) > >> + goto out_free; > >> } > >> > >> arg.xattr_pages = pages; > >> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, > >> *eofp = res.eof; > >> } > >> > >> +out_free: > >> while (--np >= 0) { > >> if (pages[np]) > >> __free_page(pages[np]); > >> } > >> - > >> - __free_page(res.scratch); > >> kfree(pages); > >> - > >> + __free_page(res.scratch); > >> return ret; > >> > >> } > >> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c > >> index 6e060a88f98c..8432bd6b95f0 100644 > >> --- a/fs/nfs/nfs42xdr.c > >> +++ b/fs/nfs/nfs42xdr.c > >> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req, > >> > >> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count, > >> hdr.replen); > >> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES; > >> > >> encode_nops(&hdr); > >> } > >> > >> > > > > I can see why this is the simplest and most pragmatic solution, so it's > > fine with me. > > Thanks. I've added Olga's Reviewed/Tested-by to my local copy. > May I add a Reviewed-by from you? Yep, thanks. > > Why doesn't this happen with getxattr? Do we need to convert that too? > > If a GETXATTR request can be generated such that buflen is less than > a page, but page_len > 0, then yes, it will happen there too. As Olga > says, NFS is unaware of the transport's inline threshold setting, so > there will always be some buflen value under which bad things happen. > > Either way, I would prefer to see GETXATTR converted to avoid using > XDRBUF_SPARSE_PAGES. Does NFSv4 GETACL provide a good example? Hm, in that case, it should be converted. That's easy enough to do. I'll send a small patch. > > > > The basic issue here is that the RPC code does not deal with inlined data > > that exceeds PAGE_SIZE. That can only be done with raw pages. > > > > Since the upper layer has already allocated a buffer in the case of listxattr > > and getxattr, I would love to be able to just XDR code in to that buffer, > > and void the whole alloc+copy situation. > > Do you mean like the READDIR entry encoders and decoders? It works out for READDIR - you use page cache fillers that do the XDR decoding in to pages that are part of i_mapping. For xattrs, that's not a good fit - I looked at it, but it would have been messy. The main thing I'm complaining about here is that the upper-layer xattr code already kvallocs a buffer, but because of restrictions in the XDR code, you always end up doing an extra copy for get/set, since you need to allocate pages. But, this doesn't apply to listxattr. > > > > But sadly, it might be > PAGE_SIZE, > > so the XDR code doesn't allow it. It's not all bad, having to use pages > > allows them to be directly hooked in to the cache in the case of getxattr, > > but for listxattr, decoding directly in to the provided buffer would be nice. > > > > Hm, I wonder if that restriction actually holds for listxattr - the invidual > > XDR items (xattr names) should never exceed PAGE_SIZE.. > > You'll have to worry about XDR data items crossing page boundaries. > Our XDR code uses a scratch buffer for that, so it can be handled > transparently. Yep, and this is what listxattr does. I was confusing the get/set and list cases, sorry about the confusion. Let me just quickly convert getxattr. - Frank