Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 26 Nov 2020 12:10:21 -0500

> On Nov 25, 2020, at 7:21 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
> 
> On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
>> 
>> 
>> On 11/24/20, 4:20 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote:
>> 
>>    On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
>>> 
>>> 
>>> On 11/24/20, 3:06 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote:
>>> 
>>>    On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
>>>> 
>>>> 
>>>> 
>>>> By switching to an XFS-backed export, I am able to reproduce the
>>>> ibcomp worker crash on my client with xfstests generic/013.
>>>> 
>>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
>>>> with page_len=12 and buflen=128. Then:
>>>> 
>>>> - Because buflen is small, rpcrdma_marshal_req will not set up a
>>>>  Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
>>>>  get invoked at all.
>>>> 
>>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
>>>>  copy received data into rq_rcv_buf->pages, but they're missing.
>>>> 
>>>> The result is that the ibcomp worker faults and dies. Sometimes that
>>>> causes a visible crash, and sometimes it results in a transport
>>>> hang without other symptoms.
>>>> 
>>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
>>>> should eventually be fixed or replaced. However, my preference is
>>>> that upper-layer operations should explicitly allocate their receive
>>>> buffers (using GFP_KERNEL) when possible, rather than relying on
>>>> XDRBUF_SPARSE_PAGES.
>>>> 
>>>> Reported-by: Olga kornievskaia <kolga@xxxxxxxxxx>
>>>> Suggested-by: Olga kornievskaia <kolga@xxxxxxxxxx>
>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>> ---
>>>> fs/nfs/nfs42proc.c |   17 ++++++++++-------
>>>> fs/nfs/nfs42xdr.c  |    1 -
>>>> 2 files changed, 10 insertions(+), 8 deletions(-)
>>>> 
>>>> Hi-
>>>> 
>>>> I like Olga's proposed approach. What do you think of this patch?
>>>> 
>>>> 
>>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
>>>> index 2b2211d1234e..24810305ec1c 100644
>>>> --- a/fs/nfs/nfs42proc.c
>>>> +++ b/fs/nfs/nfs42proc.c
>>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>>                .rpc_resp       = &res,
>>>>        };
>>>>        u32 xdrlen;
>>>> -       int ret, np;
>>>> +       int ret, np, i;
>>>> 
>>>> 
>>>>        res.scratch = alloc_page(GFP_KERNEL);
>>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>>                xdrlen = server->lxasize;
>>>>        np = xdrlen / PAGE_SIZE + 1;
>>>> 
>>>> +       ret = -ENOMEM;
>>>>        pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
>>>> -       if (pages == NULL) {
>>>> -               __free_page(res.scratch);
>>>> -               return -ENOMEM;
>>>> +       if (pages == NULL)
>>>> +               goto out_free;
>>>> +       for (i = 0; i < np; i++) {
>>>> +               pages[i] = alloc_page(GFP_KERNEL);
>>>> +               if (!pages[i])
>>>> +                       goto out_free;
>>>>        }
>>>> 
>>>>        arg.xattr_pages = pages;
>>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>>                *eofp = res.eof;
>>>>        }
>>>> 
>>>> +out_free:
>>>>        while (--np >= 0) {
>>>>                if (pages[np])
>>>>                        __free_page(pages[np]);
>>>>        }
>>>> -
>>>> -       __free_page(res.scratch);
>>>>        kfree(pages);
>>>> -
>>>> +       __free_page(res.scratch);
>>>>        return ret;
>>>> 
>>>> }
>>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
>>>> index 6e060a88f98c..8432bd6b95f0 100644
>>>> --- a/fs/nfs/nfs42xdr.c
>>>> +++ b/fs/nfs/nfs42xdr.c
>>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
>>>> 
>>>>        rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
>>>>            hdr.replen);
>>>> -       req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
>>>> 
>>>>        encode_nops(&hdr);
>>>> }
>>>> 
>>>> 
>>> 
>>>    I can see why this is the simplest and most pragmatic solution, so it's
>>>    fine with me.
>>> 
>>>    Why doesn't this happen with getxattr? Do we need to convert that too?
>>> 
>>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
>> 
>>    I'm attaching the test program I used, it should give things a better workout.
>> 
>> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
>> 
>> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.
> 
> True, the test is heavier on the setxattr / listxattr side. And with caching,
> you're not going to see a lot of GETXATTR calls. I used the same test program
> with caching off, and it works fine, though.

I unintentionally broke GETXATTR while developing the LISTXATTRS fix,
and generic/013 rather aggressively informed me that GETXATTR was no
longer working. There is some test coverage there, fwiw.

> In any case, after converting GETXATTR to pre-allocate pages, I noticed that,
> when I disabled caching, I was getting EIO instead of ERANGE back from
> calls that test the case of calling getxattr() with a buffer length that
> is insufficient.

Was TCP the underlying RPC transport?

> The behavior is somewhat strange - if you, say, set an xattr
> of length 59, then calls with lengths 56-59 get -ERANGE from decode_getxattr
> (correct), but calls with lengths 53-55 get EIO (should be -ERANGE).
> 
> E.g. non-aligned values to rpc_prepare_reply_pages make the RPC call error
> out early, even before it gets to decode_getxattr.
> 
> I noticed that all other code always seems to specify multiples of PAGE_SIZE
> as the length to rpc_prepare_reply_pages. But the code itself suggests that
> it certainly *intends* to be prepared to handle any length, aligned or not.
> 
> However, apparently, it at least doesn't deal with non-aligned lengths,
> making things fail further along down the line.
> 
> I need to look at this a bit more.
> 
> - Frank

--
Chuck Lever