Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

Olga Kornievskaia <aglo@xxxxxxxxx> · Tue, 1 Dec 2020 15:01:57 -0500

On Fri, Nov 27, 2020 at 3:40 AM Frank van der Linden
<fllinden@xxxxxxxxxx> wrote:
>
> On Thu, Nov 26, 2020 at 12:10:21PM -0500, Chuck Lever wrote:
> >
> >
> > > On Nov 25, 2020, at 7:21 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
> > >>
> > >>
> > >> On 11/24/20, 4:20 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote:
> > >>
> > >>    On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
> > >>>
> > >>>
> > >>> On 11/24/20, 3:06 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote:
> > >>>
> > >>>    On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> By switching to an XFS-backed export, I am able to reproduce the
> > >>>> ibcomp worker crash on my client with xfstests generic/013.
> > >>>>
> > >>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> > >>>> with page_len=12 and buflen=128. Then:
> > >>>>
> > >>>> - Because buflen is small, rpcrdma_marshal_req will not set up a
> > >>>>  Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> > >>>>  get invoked at all.
> > >>>>
> > >>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> > >>>>  copy received data into rq_rcv_buf->pages, but they're missing.
> > >>>>
> > >>>> The result is that the ibcomp worker faults and dies. Sometimes that
> > >>>> causes a visible crash, and sometimes it results in a transport
> > >>>> hang without other symptoms.
> > >>>>
> > >>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> > >>>> should eventually be fixed or replaced. However, my preference is
> > >>>> that upper-layer operations should explicitly allocate their receive
> > >>>> buffers (using GFP_KERNEL) when possible, rather than relying on
> > >>>> XDRBUF_SPARSE_PAGES.
> > >>>>
> > >>>> Reported-by: Olga kornievskaia <kolga@xxxxxxxxxx>
> > >>>> Suggested-by: Olga kornievskaia <kolga@xxxxxxxxxx>
> > >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> > >>>> ---
> > >>>> fs/nfs/nfs42proc.c |   17 ++++++++++-------
> > >>>> fs/nfs/nfs42xdr.c  |    1 -
> > >>>> 2 files changed, 10 insertions(+), 8 deletions(-)
> > >>>>
> > >>>> Hi-
> > >>>>
> > >>>> I like Olga's proposed approach. What do you think of this patch?
> > >>>>
> > >>>>
> > >>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> > >>>> index 2b2211d1234e..24810305ec1c 100644
> > >>>> --- a/fs/nfs/nfs42proc.c
> > >>>> +++ b/fs/nfs/nfs42proc.c
> > >>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > >>>>                .rpc_resp       = &res,
> > >>>>        };
> > >>>>        u32 xdrlen;
> > >>>> -       int ret, np;
> > >>>> +       int ret, np, i;
> > >>>>
> > >>>>
> > >>>>        res.scratch = alloc_page(GFP_KERNEL);
> > >>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > >>>>                xdrlen = server->lxasize;
> > >>>>        np = xdrlen / PAGE_SIZE + 1;
> > >>>>
> > >>>> +       ret = -ENOMEM;
> > >>>>        pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> > >>>> -       if (pages == NULL) {
> > >>>> -               __free_page(res.scratch);
> > >>>> -               return -ENOMEM;
> > >>>> +       if (pages == NULL)
> > >>>> +               goto out_free;
> > >>>> +       for (i = 0; i < np; i++) {
> > >>>> +               pages[i] = alloc_page(GFP_KERNEL);
> > >>>> +               if (!pages[i])
> > >>>> +                       goto out_free;
> > >>>>        }
> > >>>>
> > >>>>        arg.xattr_pages = pages;
> > >>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > >>>>                *eofp = res.eof;
> > >>>>        }
> > >>>>
> > >>>> +out_free:
> > >>>>        while (--np >= 0) {
> > >>>>                if (pages[np])
> > >>>>                        __free_page(pages[np]);
> > >>>>        }
> > >>>> -
> > >>>> -       __free_page(res.scratch);
> > >>>>        kfree(pages);
> > >>>> -
> > >>>> +       __free_page(res.scratch);
> > >>>>        return ret;
> > >>>>
> > >>>> }
> > >>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > >>>> index 6e060a88f98c..8432bd6b95f0 100644
> > >>>> --- a/fs/nfs/nfs42xdr.c
> > >>>> +++ b/fs/nfs/nfs42xdr.c
> > >>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> > >>>>
> > >>>>        rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> > >>>>            hdr.replen);
> > >>>> -       req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> > >>>>
> > >>>>        encode_nops(&hdr);
> > >>>> }
> > >>>>
> > >>>>
> > >>>
> > >>>    I can see why this is the simplest and most pragmatic solution, so it's
> > >>>    fine with me.
> > >>>
> > >>>    Why doesn't this happen with getxattr? Do we need to convert that too?
> > >>>
> > >>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
> > >>
> > >>    I'm attaching the test program I used, it should give things a better workout.
> > >>
> > >> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
> > >>
> > >> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.
> > >
> > > True, the test is heavier on the setxattr / listxattr side. And with caching,
> > > you're not going to see a lot of GETXATTR calls. I used the same test program
> > > with caching off, and it works fine, though.
> >
> > I unintentionally broke GETXATTR while developing the LISTXATTRS fix,
> > and generic/013 rather aggressively informed me that GETXATTR was no
> > longer working. There is some test coverage there, fwiw.
>
> Oh, the coverage was good - in my testing I also used a collection of
> small unit test programs, and I was the one who made the xattr tests
> in xfstests work on NFS.

I have just oops-ed the kernel trying to send a getxattr when
userspace provided a small buffer.

File with extended attributes was created using your application but
modified to leave the file behind. Then I coded up this to get the
extended attirbutes. Test coverage doesn't test for this.

int main(int argc, char *argv[]) {

int fd, len = 8;
char buf[8];

fd = open("/mnt/test_xattr_probeJxfiVU", O_RDWR | O_CREAT, S_IRWXU);
if (fd < 0) exit(0);

if (getxattr("/mnt/test_xattr_probeJxfiVU", "user.test_xattr_probe",
buf, len) < 0) exit(0);

return 0;
}

Which again produces the KASAN's
[ 5915.393103] BUG: KASAN: wild-memory-access in
rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma]


This is my proposed fix. Will send a proper patch if agreed:

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 4fc61e3d098d..720fbaddcb90 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -1189,12 +1189,17 @@ static ssize_t _nfs42_proc_getxattr(struct
inode *inode, const char *name,
  .rpc_argp = &arg,
  .rpc_resp = &res,
  };
- int ret, np;
+ int ret = -ENOMEM, np = NFS4XATTR_MAXPAGES, i;

+ for (i = 0; i < NFS4XATTR_MAXPAGES; i++) {
+ pages[i] = alloc_page(GFP_KERNEL);
+ if (!pages[i])
+ goto out_free_pages;
+ }
  ret = nfs4_call_sync(server->client, server, &msg, &arg.seq_args,
     &res.seq_res, 0);
  if (ret < 0)
- return ret;
+ goto out_free_pages;

  /*
  * Normally, the caching is done one layer up, but for successful
@@ -1209,16 +1214,19 @@ static ssize_t _nfs42_proc_getxattr(struct
inode *inode, const char *name,
  nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len);

  if (buflen) {
- if (res.xattr_len > buflen)
- return -ERANGE;
+ if (res.xattr_len > buflen) {
+ ret = -ERANGE;
+ goto out_free_pages;
+ }
  _copy_from_pages(buf, pages, 0, res.xattr_len);
  }

- np = DIV_ROUND_UP(res.xattr_len, PAGE_SIZE);
+ ret = res.xattr_len;
+out_free_pages:
  while (--np >= 0)
  __free_page(pages[np]);

- return res.xattr_len;
+ return ret;
 }

 static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,

>
> >
> >
> > > In any case, after converting GETXATTR to pre-allocate pages, I noticed that,
> > > when I disabled caching, I was getting EIO instead of ERANGE back from
> > > calls that test the case of calling getxattr() with a buffer length that
> > > is insufficient.
> >
> > Was TCP the underlying RPC transport?
>
> Yes, this was TCP. I have set up rdma over rxe, which I'll test too if I
> can get this figured out. It might be a long standing bug in xdr_inline_pages,
> as listxattr / getxattr seem to be simply the first ones to pass in a
> length that is not page aligned.
>
> It does make some sense to round the length up to PAGE_SIZE, and just check if
> the received data fits when decoding, like other calls do. It improves your
> chances of getting a result that you can still cache, even if it's too big for
> the length that was asked for. E.g. if the result is > requested_length, but
> < ROUND_UP(requested_length, PAGE_SIZE), you can cache it, even though you
> have to return -ERANGE to the caller.
>
> - Frank