Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/24/20, 4:20 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote:

    NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




    On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
    >
    >
    > On 11/24/20, 3:06 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote:
    >
    >     On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
    >     > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
    >     >
    >     >
    >     >
    >     > By switching to an XFS-backed export, I am able to reproduce the
    >     > ibcomp worker crash on my client with xfstests generic/013.
    >     >
    >     > For the failing LISTXATTRS operation, xdr_inline_pages() is called
    >     > with page_len=12 and buflen=128. Then:
    >     >
    >     > - Because buflen is small, rpcrdma_marshal_req will not set up a
    >     >   Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
    >     >   get invoked at all.
    >     >
    >     > - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
    >     >   copy received data into rq_rcv_buf->pages, but they're missing.
    >     >
    >     > The result is that the ibcomp worker faults and dies. Sometimes that
    >     > causes a visible crash, and sometimes it results in a transport
    >     > hang without other symptoms.
    >     >
    >     > RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
    >     > should eventually be fixed or replaced. However, my preference is
    >     > that upper-layer operations should explicitly allocate their receive
    >     > buffers (using GFP_KERNEL) when possible, rather than relying on
    >     > XDRBUF_SPARSE_PAGES.
    >     >
    >     > Reported-by: Olga kornievskaia <kolga@xxxxxxxxxx>
    >     > Suggested-by: Olga kornievskaia <kolga@xxxxxxxxxx>
    >     > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
    >     > ---
    >     >  fs/nfs/nfs42proc.c |   17 ++++++++++-------
    >     >  fs/nfs/nfs42xdr.c  |    1 -
    >     >  2 files changed, 10 insertions(+), 8 deletions(-)
    >     >
    >     > Hi-
    >     >
    >     > I like Olga's proposed approach. What do you think of this patch?
    >     >
    >     >
    >     > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
    >     > index 2b2211d1234e..24810305ec1c 100644
    >     > --- a/fs/nfs/nfs42proc.c
    >     > +++ b/fs/nfs/nfs42proc.c
    >     > @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
    >     >                 .rpc_resp       = &res,
    >     >         };
    >     >         u32 xdrlen;
    >     > -       int ret, np;
    >     > +       int ret, np, i;
    >     >
    >     >
    >     >         res.scratch = alloc_page(GFP_KERNEL);
    >     > @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
    >     >                 xdrlen = server->lxasize;
    >     >         np = xdrlen / PAGE_SIZE + 1;
    >     >
    >     > +       ret = -ENOMEM;
    >     >         pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
    >     > -       if (pages == NULL) {
    >     > -               __free_page(res.scratch);
    >     > -               return -ENOMEM;
    >     > +       if (pages == NULL)
    >     > +               goto out_free;
    >     > +       for (i = 0; i < np; i++) {
    >     > +               pages[i] = alloc_page(GFP_KERNEL);
    >     > +               if (!pages[i])
    >     > +                       goto out_free;
    >     >         }
    >     >
    >     >         arg.xattr_pages = pages;
    >     > @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
    >     >                 *eofp = res.eof;
    >     >         }
    >     >
    >     > +out_free:
    >     >         while (--np >= 0) {
    >     >                 if (pages[np])
    >     >                         __free_page(pages[np]);
    >     >         }
    >     > -
    >     > -       __free_page(res.scratch);
    >     >         kfree(pages);
    >     > -
    >     > +       __free_page(res.scratch);
    >     >         return ret;
    >     >
    >     >  }
    >     > diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
    >     > index 6e060a88f98c..8432bd6b95f0 100644
    >     > --- a/fs/nfs/nfs42xdr.c
    >     > +++ b/fs/nfs/nfs42xdr.c
    >     > @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
    >     >
    >     >         rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
    >     >             hdr.replen);
    >     > -       req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
    >     >
    >     >         encode_nops(&hdr);
    >     >  }
    >     >
    >     >
    >
    >     I can see why this is the simplest and most pragmatic solution, so it's
    >     fine with me.
    >
    >     Why doesn't this happen with getxattr? Do we need to convert that too?
    >
    > [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.

    I'm attaching the test program I used, it should give things a better workout.

[olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway... 

This is my initial report: no real exercise of the GETXATTR code as far as I can tell.

    - Frank





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux