> On Nov 25, 2020, at 7:21 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote: > > On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote: >> >> >> On 11/24/20, 4:20 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote: >> >> On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote: >>> >>> >>> On 11/24/20, 3:06 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote: >>> >>> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote: >>>> >>>> >>>> >>>> By switching to an XFS-backed export, I am able to reproduce the >>>> ibcomp worker crash on my client with xfstests generic/013. >>>> >>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called >>>> with page_len=12 and buflen=128. Then: >>>> >>>> - Because buflen is small, rpcrdma_marshal_req will not set up a >>>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not >>>> get invoked at all. >>>> >>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to >>>> copy received data into rq_rcv_buf->pages, but they're missing. >>>> >>>> The result is that the ibcomp worker faults and dies. Sometimes that >>>> causes a visible crash, and sometimes it results in a transport >>>> hang without other symptoms. >>>> >>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and >>>> should eventually be fixed or replaced. However, my preference is >>>> that upper-layer operations should explicitly allocate their receive >>>> buffers (using GFP_KERNEL) when possible, rather than relying on >>>> XDRBUF_SPARSE_PAGES. >>>> >>>> Reported-by: Olga kornievskaia <kolga@xxxxxxxxxx> >>>> Suggested-by: Olga kornievskaia <kolga@xxxxxxxxxx> >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>> --- >>>> fs/nfs/nfs42proc.c | 17 ++++++++++------- >>>> fs/nfs/nfs42xdr.c | 1 - >>>> 2 files changed, 10 insertions(+), 8 deletions(-) >>>> >>>> Hi- >>>> >>>> I like Olga's proposed approach. What do you think of this patch? >>>> >>>> >>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c >>>> index 2b2211d1234e..24810305ec1c 100644 >>>> --- a/fs/nfs/nfs42proc.c >>>> +++ b/fs/nfs/nfs42proc.c >>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, >>>> .rpc_resp = &res, >>>> }; >>>> u32 xdrlen; >>>> - int ret, np; >>>> + int ret, np, i; >>>> >>>> >>>> res.scratch = alloc_page(GFP_KERNEL); >>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, >>>> xdrlen = server->lxasize; >>>> np = xdrlen / PAGE_SIZE + 1; >>>> >>>> + ret = -ENOMEM; >>>> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL); >>>> - if (pages == NULL) { >>>> - __free_page(res.scratch); >>>> - return -ENOMEM; >>>> + if (pages == NULL) >>>> + goto out_free; >>>> + for (i = 0; i < np; i++) { >>>> + pages[i] = alloc_page(GFP_KERNEL); >>>> + if (!pages[i]) >>>> + goto out_free; >>>> } >>>> >>>> arg.xattr_pages = pages; >>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, >>>> *eofp = res.eof; >>>> } >>>> >>>> +out_free: >>>> while (--np >= 0) { >>>> if (pages[np]) >>>> __free_page(pages[np]); >>>> } >>>> - >>>> - __free_page(res.scratch); >>>> kfree(pages); >>>> - >>>> + __free_page(res.scratch); >>>> return ret; >>>> >>>> } >>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c >>>> index 6e060a88f98c..8432bd6b95f0 100644 >>>> --- a/fs/nfs/nfs42xdr.c >>>> +++ b/fs/nfs/nfs42xdr.c >>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req, >>>> >>>> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count, >>>> hdr.replen); >>>> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES; >>>> >>>> encode_nops(&hdr); >>>> } >>>> >>>> >>> >>> I can see why this is the simplest and most pragmatic solution, so it's >>> fine with me. >>> >>> Why doesn't this happen with getxattr? Do we need to convert that too? >>> >>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr. >> >> I'm attaching the test program I used, it should give things a better workout. >> >> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway... >> >> This is my initial report: no real exercise of the GETXATTR code as far as I can tell. > > True, the test is heavier on the setxattr / listxattr side. And with caching, > you're not going to see a lot of GETXATTR calls. I used the same test program > with caching off, and it works fine, though. I unintentionally broke GETXATTR while developing the LISTXATTRS fix, and generic/013 rather aggressively informed me that GETXATTR was no longer working. There is some test coverage there, fwiw. > In any case, after converting GETXATTR to pre-allocate pages, I noticed that, > when I disabled caching, I was getting EIO instead of ERANGE back from > calls that test the case of calling getxattr() with a buffer length that > is insufficient. Was TCP the underlying RPC transport? > The behavior is somewhat strange - if you, say, set an xattr > of length 59, then calls with lengths 56-59 get -ERANGE from decode_getxattr > (correct), but calls with lengths 53-55 get EIO (should be -ERANGE). > > E.g. non-aligned values to rpc_prepare_reply_pages make the RPC call error > out early, even before it gets to decode_getxattr. > > I noticed that all other code always seems to specify multiples of PAGE_SIZE > as the length to rpc_prepare_reply_pages. But the code itself suggests that > it certainly *intends* to be prepared to handle any length, aligned or not. > > However, apparently, it at least doesn't deal with non-aligned lengths, > making things fail further along down the line. > > I need to look at this a bit more. > > - Frank -- Chuck Lever