On Fri, Nov 27, 2020 at 3:40 AM Frank van der Linden <fllinden@xxxxxxxxxx> wrote: > > On Thu, Nov 26, 2020 at 12:10:21PM -0500, Chuck Lever wrote: > > > > > > > On Nov 25, 2020, at 7:21 PM, Frank van der Linden <fllinden@xxxxxxxxxx> wrote: > > > > > > On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote: > > >> > > >> > > >> On 11/24/20, 4:20 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote: > > >> > > >> On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote: > > >>> > > >>> > > >>> On 11/24/20, 3:06 PM, "Frank van der Linden" <fllinden@xxxxxxxxxx> wrote: > > >>> > > >>> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote: > > >>>> > > >>>> > > >>>> > > >>>> By switching to an XFS-backed export, I am able to reproduce the > > >>>> ibcomp worker crash on my client with xfstests generic/013. > > >>>> > > >>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called > > >>>> with page_len=12 and buflen=128. Then: > > >>>> > > >>>> - Because buflen is small, rpcrdma_marshal_req will not set up a > > >>>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not > > >>>> get invoked at all. > > >>>> > > >>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to > > >>>> copy received data into rq_rcv_buf->pages, but they're missing. > > >>>> > > >>>> The result is that the ibcomp worker faults and dies. Sometimes that > > >>>> causes a visible crash, and sometimes it results in a transport > > >>>> hang without other symptoms. > > >>>> > > >>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and > > >>>> should eventually be fixed or replaced. However, my preference is > > >>>> that upper-layer operations should explicitly allocate their receive > > >>>> buffers (using GFP_KERNEL) when possible, rather than relying on > > >>>> XDRBUF_SPARSE_PAGES. > > >>>> > > >>>> Reported-by: Olga kornievskaia <kolga@xxxxxxxxxx> > > >>>> Suggested-by: Olga kornievskaia <kolga@xxxxxxxxxx> > > >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > > >>>> --- > > >>>> fs/nfs/nfs42proc.c | 17 ++++++++++------- > > >>>> fs/nfs/nfs42xdr.c | 1 - > > >>>> 2 files changed, 10 insertions(+), 8 deletions(-) > > >>>> > > >>>> Hi- > > >>>> > > >>>> I like Olga's proposed approach. What do you think of this patch? > > >>>> > > >>>> > > >>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c > > >>>> index 2b2211d1234e..24810305ec1c 100644 > > >>>> --- a/fs/nfs/nfs42proc.c > > >>>> +++ b/fs/nfs/nfs42proc.c > > >>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, > > >>>> .rpc_resp = &res, > > >>>> }; > > >>>> u32 xdrlen; > > >>>> - int ret, np; > > >>>> + int ret, np, i; > > >>>> > > >>>> > > >>>> res.scratch = alloc_page(GFP_KERNEL); > > >>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, > > >>>> xdrlen = server->lxasize; > > >>>> np = xdrlen / PAGE_SIZE + 1; > > >>>> > > >>>> + ret = -ENOMEM; > > >>>> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL); > > >>>> - if (pages == NULL) { > > >>>> - __free_page(res.scratch); > > >>>> - return -ENOMEM; > > >>>> + if (pages == NULL) > > >>>> + goto out_free; > > >>>> + for (i = 0; i < np; i++) { > > >>>> + pages[i] = alloc_page(GFP_KERNEL); > > >>>> + if (!pages[i]) > > >>>> + goto out_free; > > >>>> } > > >>>> > > >>>> arg.xattr_pages = pages; > > >>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, > > >>>> *eofp = res.eof; > > >>>> } > > >>>> > > >>>> +out_free: > > >>>> while (--np >= 0) { > > >>>> if (pages[np]) > > >>>> __free_page(pages[np]); > > >>>> } > > >>>> - > > >>>> - __free_page(res.scratch); > > >>>> kfree(pages); > > >>>> - > > >>>> + __free_page(res.scratch); > > >>>> return ret; > > >>>> > > >>>> } > > >>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c > > >>>> index 6e060a88f98c..8432bd6b95f0 100644 > > >>>> --- a/fs/nfs/nfs42xdr.c > > >>>> +++ b/fs/nfs/nfs42xdr.c > > >>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req, > > >>>> > > >>>> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count, > > >>>> hdr.replen); > > >>>> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES; > > >>>> > > >>>> encode_nops(&hdr); > > >>>> } > > >>>> > > >>>> > > >>> > > >>> I can see why this is the simplest and most pragmatic solution, so it's > > >>> fine with me. > > >>> > > >>> Why doesn't this happen with getxattr? Do we need to convert that too? > > >>> > > >>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr. > > >> > > >> I'm attaching the test program I used, it should give things a better workout. > > >> > > >> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway... > > >> > > >> This is my initial report: no real exercise of the GETXATTR code as far as I can tell. > > > > > > True, the test is heavier on the setxattr / listxattr side. And with caching, > > > you're not going to see a lot of GETXATTR calls. I used the same test program > > > with caching off, and it works fine, though. > > > > I unintentionally broke GETXATTR while developing the LISTXATTRS fix, > > and generic/013 rather aggressively informed me that GETXATTR was no > > longer working. There is some test coverage there, fwiw. > > Oh, the coverage was good - in my testing I also used a collection of > small unit test programs, and I was the one who made the xattr tests > in xfstests work on NFS. I have just oops-ed the kernel trying to send a getxattr when userspace provided a small buffer. File with extended attributes was created using your application but modified to leave the file behind. Then I coded up this to get the extended attirbutes. Test coverage doesn't test for this. int main(int argc, char *argv[]) { int fd, len = 8; char buf[8]; fd = open("/mnt/test_xattr_probeJxfiVU", O_RDWR | O_CREAT, S_IRWXU); if (fd < 0) exit(0); if (getxattr("/mnt/test_xattr_probeJxfiVU", "user.test_xattr_probe", buf, len) < 0) exit(0); return 0; } Which again produces the KASAN's [ 5915.393103] BUG: KASAN: wild-memory-access in rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma] This is my proposed fix. Will send a proper patch if agreed: diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 4fc61e3d098d..720fbaddcb90 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -1189,12 +1189,17 @@ static ssize_t _nfs42_proc_getxattr(struct inode *inode, const char *name, .rpc_argp = &arg, .rpc_resp = &res, }; - int ret, np; + int ret = -ENOMEM, np = NFS4XATTR_MAXPAGES, i; + for (i = 0; i < NFS4XATTR_MAXPAGES; i++) { + pages[i] = alloc_page(GFP_KERNEL); + if (!pages[i]) + goto out_free_pages; + } ret = nfs4_call_sync(server->client, server, &msg, &arg.seq_args, &res.seq_res, 0); if (ret < 0) - return ret; + goto out_free_pages; /* * Normally, the caching is done one layer up, but for successful @@ -1209,16 +1214,19 @@ static ssize_t _nfs42_proc_getxattr(struct inode *inode, const char *name, nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len); if (buflen) { - if (res.xattr_len > buflen) - return -ERANGE; + if (res.xattr_len > buflen) { + ret = -ERANGE; + goto out_free_pages; + } _copy_from_pages(buf, pages, 0, res.xattr_len); } - np = DIV_ROUND_UP(res.xattr_len, PAGE_SIZE); + ret = res.xattr_len; +out_free_pages: while (--np >= 0) __free_page(pages[np]); - return res.xattr_len; + return ret; } static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf, > > > > > > > > In any case, after converting GETXATTR to pre-allocate pages, I noticed that, > > > when I disabled caching, I was getting EIO instead of ERANGE back from > > > calls that test the case of calling getxattr() with a buffer length that > > > is insufficient. > > > > Was TCP the underlying RPC transport? > > Yes, this was TCP. I have set up rdma over rxe, which I'll test too if I > can get this figured out. It might be a long standing bug in xdr_inline_pages, > as listxattr / getxattr seem to be simply the first ones to pass in a > length that is not page aligned. > > It does make some sense to round the length up to PAGE_SIZE, and just check if > the received data fits when decoding, like other calls do. It improves your > chances of getting a result that you can still cache, even if it's too big for > the length that was asked for. E.g. if the result is > requested_length, but > < ROUND_UP(requested_length, PAGE_SIZE), you can cache it, even though you > have to return -ERANGE to the caller. > > - Frank