Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

Frank van der Linden <fllinden@xxxxxxxxxx> · Tue, 24 Nov 2020 21:14:03 +0000

On Tue, Nov 24, 2020 at 03:18:55PM -0500, Chuck Lever wrote:
> > On Nov 24, 2020, at 3:06 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote:
> >
> > On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>
> >> By switching to an XFS-backed export, I am able to reproduce the
> >> ibcomp worker crash on my client with xfstests generic/013.
> >>
> >> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> >> with page_len=12 and buflen=128. Then:
> >>
> >> - Because buflen is small, rpcrdma_marshal_req will not set up a
> >>  Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> >>  get invoked at all.
> >>
> >> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> >>  copy received data into rq_rcv_buf->pages, but they're missing.
> >>
> >> The result is that the ibcomp worker faults and dies. Sometimes that
> >> causes a visible crash, and sometimes it results in a transport
> >> hang without other symptoms.
> >>
> >> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> >> should eventually be fixed or replaced. However, my preference is
> >> that upper-layer operations should explicitly allocate their receive
> >> buffers (using GFP_KERNEL) when possible, rather than relying on
> >> XDRBUF_SPARSE_PAGES.
> >>
> >> Reported-by: Olga kornievskaia <kolga@xxxxxxxxxx>
> >> Suggested-by: Olga kornievskaia <kolga@xxxxxxxxxx>
> >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >> ---
> >> fs/nfs/nfs42proc.c |   17 ++++++++++-------
> >> fs/nfs/nfs42xdr.c  |    1 -
> >> 2 files changed, 10 insertions(+), 8 deletions(-)
> >>
> >> Hi-
> >>
> >> I like Olga's proposed approach. What do you think of this patch?
> >>
> >>
> >> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> >> index 2b2211d1234e..24810305ec1c 100644
> >> --- a/fs/nfs/nfs42proc.c
> >> +++ b/fs/nfs/nfs42proc.c
> >> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> >>                .rpc_resp       = &res,
> >>        };
> >>        u32 xdrlen;
> >> -       int ret, np;
> >> +       int ret, np, i;
> >>
> >>
> >>        res.scratch = alloc_page(GFP_KERNEL);
> >> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> >>                xdrlen = server->lxasize;
> >>        np = xdrlen / PAGE_SIZE + 1;
> >>
> >> +       ret = -ENOMEM;
> >>        pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> >> -       if (pages == NULL) {
> >> -               __free_page(res.scratch);
> >> -               return -ENOMEM;
> >> +       if (pages == NULL)
> >> +               goto out_free;
> >> +       for (i = 0; i < np; i++) {
> >> +               pages[i] = alloc_page(GFP_KERNEL);
> >> +               if (!pages[i])
> >> +                       goto out_free;
> >>        }
> >>
> >>        arg.xattr_pages = pages;
> >> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> >>                *eofp = res.eof;
> >>        }
> >>
> >> +out_free:
> >>        while (--np >= 0) {
> >>                if (pages[np])
> >>                        __free_page(pages[np]);
> >>        }
> >> -
> >> -       __free_page(res.scratch);
> >>        kfree(pages);
> >> -
> >> +       __free_page(res.scratch);
> >>        return ret;
> >>
> >> }
> >> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> >> index 6e060a88f98c..8432bd6b95f0 100644
> >> --- a/fs/nfs/nfs42xdr.c
> >> +++ b/fs/nfs/nfs42xdr.c
> >> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> >>
> >>        rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> >>            hdr.replen);
> >> -       req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> >>
> >>        encode_nops(&hdr);
> >> }
> >>
> >>
> >
> > I can see why this is the simplest and most pragmatic solution, so it's
> > fine with me.
> 
> Thanks. I've added Olga's Reviewed/Tested-by to my local copy.
> May I add a Reviewed-by from you?

Yep, thanks.

> > Why doesn't this happen with getxattr? Do we need to convert that too?
> 
> If a GETXATTR request can be generated such that buflen is less than
> a page, but page_len > 0, then yes, it will happen there too. As Olga
> says, NFS is unaware of the transport's inline threshold setting, so
> there will always be some buflen value under which bad things happen.
> 
> Either way, I would prefer to see GETXATTR converted to avoid using
> XDRBUF_SPARSE_PAGES. Does NFSv4 GETACL provide a good example?

Hm, in that case, it should be converted. That's easy enough to do.
I'll send a small patch.

> 
> 
> > The basic issue here is that the RPC code does not deal with inlined data
> > that exceeds PAGE_SIZE. That can only be done with raw pages.
> >
> > Since the upper layer has already allocated a buffer in the case of listxattr
> > and getxattr, I would love to be able to just XDR code in to that buffer,
> > and void the whole alloc+copy situation.
> 
> Do you mean like the READDIR entry encoders and decoders?

It works out for READDIR - you use page cache fillers that do the XDR
decoding in to pages that are part of i_mapping. For xattrs, that's
not a good fit - I looked at it, but it would have been messy.

The main thing I'm complaining about here is that the upper-layer xattr
code already kvallocs a buffer, but because of restrictions in the XDR
code, you always end up doing an extra copy for get/set, since you need
to allocate pages. But, this doesn't apply to listxattr.

> 
> 
> > But sadly, it might be > PAGE_SIZE,
> > so the XDR code doesn't allow it. It's not all bad, having to use pages
> > allows them to be directly hooked in to the cache in the case of getxattr,
> > but for listxattr, decoding directly in to the provided buffer would be nice.
> >
> > Hm, I wonder if that restriction actually holds for listxattr - the invidual
> > XDR items (xattr names) should never exceed PAGE_SIZE..
> 
> You'll have to worry about XDR data items crossing page boundaries.
> Our XDR code uses a scratch buffer for that, so it can be handled
> transparently.

Yep, and this is what listxattr does. I was confusing the get/set and list
cases, sorry about the confusion.

Let me just quickly convert getxattr.

- Frank