Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

Chuck Lever <chuck.lever@xxxxxxxxxx> · Fri, 27 Nov 2020 12:06:29 -0500

> On Nov 26, 2020, at 2:32 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
> 
> On Thu, Nov 26, 2020 at 12:10:21PM -0500, Chuck Lever wrote:
>> 
>> 
>>> On Nov 25, 2020, at 7:21 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
>>> 
>>> On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
>>>> 
>>>> 
>>>> On 11/24/20, 4:20 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote:
>>>> 
>>>>   On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
>>>>> 
>>>>> 
>>>>> On 11/24/20, 3:06 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote:
>>>>> 
>>>>>   On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> By switching to an XFS-backed export, I am able to reproduce the
>>>>>> ibcomp worker crash on my client with xfstests generic/013.
>>>>>> 
>>>>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
>>>>>> with page_len=12 and buflen=128. Then:
>>>>>> 
>>>>>> - Because buflen is small, rpcrdma_marshal_req will not set up a
>>>>>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
>>>>>> get invoked at all.
>>>>>> 
>>>>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
>>>>>> copy received data into rq_rcv_buf->pages, but they're missing.
>>>>>> 
>>>>>> The result is that the ibcomp worker faults and dies. Sometimes that
>>>>>> causes a visible crash, and sometimes it results in a transport
>>>>>> hang without other symptoms.
>>>>>> 
>>>>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
>>>>>> should eventually be fixed or replaced. However, my preference is
>>>>>> that upper-layer operations should explicitly allocate their receive
>>>>>> buffers (using GFP_KERNEL) when possible, rather than relying on
>>>>>> XDRBUF_SPARSE_PAGES.
>>>>>> 
>>>>>> Reported-by: Olga kornievskaia <kolga@xxxxxxxxxx>
>>>>>> Suggested-by: Olga kornievskaia <kolga@xxxxxxxxxx>
>>>>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>>>>>> ---
>>>>>> fs/nfs/nfs42proc.c |   17 ++++++++++-------
>>>>>> fs/nfs/nfs42xdr.c  |    1 -
>>>>>> 2 files changed, 10 insertions(+), 8 deletions(-)
>>>>>> 
>>>>>> Hi-
>>>>>> 
>>>>>> I like Olga's proposed approach. What do you think of this patch?
>>>>>> 
>>>>>> 
>>>>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
>>>>>> index 2b2211d1234e..24810305ec1c 100644
>>>>>> --- a/fs/nfs/nfs42proc.c
>>>>>> +++ b/fs/nfs/nfs42proc.c
>>>>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>>>>               .rpc_resp       = &res,
>>>>>>       };
>>>>>>       u32 xdrlen;
>>>>>> -       int ret, np;
>>>>>> +       int ret, np, i;
>>>>>> 
>>>>>> 
>>>>>>       res.scratch = alloc_page(GFP_KERNEL);
>>>>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>>>>               xdrlen = server->lxasize;
>>>>>>       np = xdrlen / PAGE_SIZE + 1;
>>>>>> 
>>>>>> +       ret = -ENOMEM;
>>>>>>       pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
>>>>>> -       if (pages == NULL) {
>>>>>> -               __free_page(res.scratch);
>>>>>> -               return -ENOMEM;
>>>>>> +       if (pages == NULL)
>>>>>> +               goto out_free;
>>>>>> +       for (i = 0; i < np; i++) {
>>>>>> +               pages[i] = alloc_page(GFP_KERNEL);
>>>>>> +               if (!pages[i])
>>>>>> +                       goto out_free;
>>>>>>       }
>>>>>> 
>>>>>>       arg.xattr_pages = pages;
>>>>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>>>>               *eofp = res.eof;
>>>>>>       }
>>>>>> 
>>>>>> +out_free:
>>>>>>       while (--np >= 0) {
>>>>>>               if (pages[np])
>>>>>>                       __free_page(pages[np]);
>>>>>>       }
>>>>>> -
>>>>>> -       __free_page(res.scratch);
>>>>>>       kfree(pages);
>>>>>> -
>>>>>> +       __free_page(res.scratch);
>>>>>>       return ret;
>>>>>> 
>>>>>> }
>>>>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
>>>>>> index 6e060a88f98c..8432bd6b95f0 100644
>>>>>> --- a/fs/nfs/nfs42xdr.c
>>>>>> +++ b/fs/nfs/nfs42xdr.c
>>>>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
>>>>>> 
>>>>>>       rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
>>>>>>           hdr.replen);
>>>>>> -       req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
>>>>>> 
>>>>>>       encode_nops(&hdr);
>>>>>> }
>>>>>> 
>>>>>> 
>>>>> 
>>>>>   I can see why this is the simplest and most pragmatic solution, so it's
>>>>>   fine with me.
>>>>> 
>>>>>   Why doesn't this happen with getxattr? Do we need to convert that too?
>>>>> 
>>>>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
>>>> 
>>>>   I'm attaching the test program I used, it should give things a better workout.
>>>> 
>>>> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
>>>> 
>>>> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.
>>> 
>>> True, the test is heavier on the setxattr / listxattr side. And with caching,
>>> you're not going to see a lot of GETXATTR calls. I used the same test program
>>> with caching off, and it works fine, though.
>> 
>> I unintentionally broke GETXATTR while developing the LISTXATTRS fix,
>> and generic/013 rather aggressively informed me that GETXATTR was no
>> longer working. There is some test coverage there, fwiw.
> 
> Oh, the coverage was good - in my testing I also used a collection of
> small unit test programs, and I was the one who made the xattr tests
> in xfstests work on NFS.
> 
>> 
>> 
>>> In any case, after converting GETXATTR to pre-allocate pages, I noticed that,
>>> when I disabled caching, I was getting EIO instead of ERANGE back from
>>> calls that test the case of calling getxattr() with a buffer length that
>>> is insufficient.
>> 
>> Was TCP the underlying RPC transport?
> 
> Yes, this was TCP. I have set up rdma over rxe, which I'll test too if I
> can get this figured out. It might be a long standing bug in xdr_inline_pages,
> as listxattr / getxattr seem to be simply the first ones to pass in a
> length that is not page aligned.

Or your maxsz macro could be missing a "+ 1" for the XDR pad needed for
the unaligned length cases.

> It does make some sense to round the length up to PAGE_SIZE, and just check if
> the received data fits when decoding, like other calls do. It improves your
> chances of getting a result that you can still cache, even if it's too big for
> the length that was asked for. E.g. if the result is > requested_length, but
> < ROUND_UP(requested_length, PAGE_SIZE), you can cache it, even though you
> have to return -ERANGE to the caller.
> 
> - Frank

--
Chuck Lever