On Tue, Dec 8, 2020 at 2:59 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > > > > On Dec 8, 2020, at 2:49 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > > > > On Sat, Dec 5, 2020 at 3:24 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >> > >> Olga K. observed that rpcrdma_marsh_req() allocates sparse pages > >> only when it has determined that a Reply chunk is necessary. There > >> are plenty of cases where no Reply chunk is needed, but the > >> XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in > >> rpcrdma_inline_fixup() when it tries to copy parts of the received > >> Reply into a missing page. > >> > >> To avoid crashing, handle sparse page allocation up front. > >> > >> Until XATTR support was added, this issue did not appear often > >> because the only SPARSE_PAGES consumer always expected a reply large > >> enough to always require a Reply chunk. > > > > Ok so given that ACL never utilizes this, the only way to test this is > > to remove the xattr fixes and try this and run the xfstest generic/013 > > would that be correct? > > I'm simply hoisting the SPARSE_PAGES logic out of convert_iovs and into > marshal_req, to ensure that it is visited no matter what chunk type is > being encoded. NFSv3 GETACL would use this new code path instead of the > old one, so it would still get exercised. > > > > If so, then just applying this patch on top of pure say -rc4 produces problems. > > I don't follow you. What problems would occur? On -rc4, the LISTXATTRS > crash you found should go away after this patch is applied. This is my git (on top of Trond's origin/testing though might not be totally current, but still -rc4): commit 0abdda88e25673a8daed06481ba2e91152a311d6 (HEAD -> 12082020-deleteme) Author: Chuck Lever <chuck.lever@xxxxxxxxxx> Date: Sat Dec 5 15:22:57 2020 -0500 xprtrdma: Fix XDRBUF_SPARSE_PAGES support Olga K. observed that rpcrdma_marsh_req() allocates sparse pages only when it has determined that a Reply chunk is necessary. There are plenty of cases where no Reply chunk is needed, but the XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in rpcrdma_inline_fixup() when it tries to copy parts of the received Reply into a missing page. To avoid crashing, handle sparse page allocation up front. Until XATTR support was added, this issue did not appear often because the only SPARSE_PAGES consumer always expected a reply large enough to always require a Reply chunk. Reported-by: Olga Kornievskaia <kolga@xxxxxxxxxx> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> commit 66ff5a8a60a4eaa26d19a8fd8aff30ce319b14fb Author: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> Date: Sun Nov 15 17:04:35 2020 -0500 Then I run generic/013 and it crashes with: [ 107.792204] run fstests generic/013 at 2020-12-08 15:12:07 [ 108.330372] ================================================================== [ 108.333913] BUG: KASAN: slab-out-of-bounds in rpcrdma_alloc_sparse_pages+0x5d/0xa0 [rpcrdma] [ 108.337696] Read of size 8 at addr ffff88801a16a098 by task fsstress/2999 [ 108.341357] [ 108.342138] CPU: 0 PID: 2999 Comm: fsstress Not tainted 5.10.0-rc4+ #54 [ 108.345725] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 108.351232] Call Trace: [ 108.352322] dump_stack+0x7c/0xa2 [ 108.353935] ? rpcrdma_alloc_sparse_pages+0x5d/0xa0 [rpcrdma] [ 108.356870] print_address_description.constprop.7+0x1e/0x230 [ 108.359354] ? record_print_text.cold.38+0x11/0x11 [ 108.361437] ? _raw_write_lock_irqsave+0xe0/0xe0 [ 108.363665] ? rpcrdma_alloc_sparse_pages+0x5d/0xa0 [rpcrdma] [ 108.367476] ? rpcrdma_alloc_sparse_pages+0x5d/0xa0 [rpcrdma] [ 108.370174] kasan_report.cold.9+0x37/0x7c [ 108.372428] ? _raw_spin_lock_irqsave+0x80/0xe0 [ 108.374489] ? rpcrdma_alloc_sparse_pages+0x5d/0xa0 [rpcrdma] [ 108.376978] rpcrdma_alloc_sparse_pages+0x5d/0xa0 [rpcrdma] [ 108.379404] rpcrdma_marshal_req+0xbdb/0x1450 [rpcrdma] [ 108.381941] ? _raw_spin_lock+0x7a/0xd0 [ 108.383628] ? _raw_spin_lock+0x7a/0xd0 [ 108.385305] ? _raw_write_lock_bh+0xe0/0xe0 [ 108.387184] ? rpcrdma_prepare_send_sges+0x7e0/0x7e0 [rpcrdma] [ 108.390980] ? call_bind+0x60/0xf0 [sunrpc] [ 108.392840] xprt_rdma_send_request+0x79/0x190 [rpcrdma] [ 108.395195] xprt_transmit+0x2ae/0x6c0 [sunrpc] [ 108.397639] ? call_bind+0xf0/0xf0 [sunrpc] [ 108.399581] call_transmit+0xdd/0x110 [sunrpc] [ 108.401583] ? call_bind+0xf0/0xf0 [sunrpc] [ 108.403518] __rpc_execute+0x11c/0x6e0 [sunrpc] [ 108.406016] ? trace_event_raw_event_xprt_cong_event+0x270/0x270 [sunrpc] [ 108.409178] ? rpc_make_runnable+0x54/0xe0 [sunrpc] [ 108.411599] rpc_run_task+0x29c/0x2c0 [sunrpc] [ 108.414020] nfs4_call_sync_custom+0xc/0x40 [nfsv4] [ 108.417269] nfs4_do_call_sync+0x114/0x160 [nfsv4] [ 108.419665] ? nfs4_call_sync_custom+0x40/0x40 [nfsv4] [ 108.423873] ? __alloc_pages_nodemask+0x200/0x410 [ 108.426445] ? kasan_unpoison_shadow+0x30/0x40 [ 108.428737] ? __kasan_kmalloc.constprop.8+0xc1/0xd0 [ 108.431076] _nfs42_proc_listxattrs+0x1f6/0x2f0 [nfsv4] [ 108.433381] ? kasan_set_free_info+0x1b/0x30 [ 108.435278] ? nfs42_offload_cancel_done+0x50/0x50 [nfsv4] [ 108.438490] ? page_put_link+0x90/0x90 [ 108.440328] ? lockref_get+0x120/0x120 [ 108.442344] ? _raw_spin_lock+0x7a/0xd0 [ 108.444199] nfs42_proc_listxattrs+0xf4/0x150 [nfsv4] [ 108.447745] ? nfs42_proc_setxattr+0x150/0x150 [nfsv4] [ 108.450851] ? nfs4_xattr_cache_list+0x21/0x120 [nfsv4] [ 108.454063] nfs4_listxattr+0x34d/0x3d0 [nfsv4] [ 108.456597] ? _nfs4_proc_access+0x260/0x260 [nfsv4] [ 108.458953] ? __ia32_sys_rename+0x40/0x40 [ 108.460722] ? __ia32_sys_lstat+0x30/0x30 [ 108.462696] ? __check_object_size+0x178/0x220 [ 108.464679] ? strncpy_from_user+0xe9/0x230 [ 108.466576] ? security_inode_listxattr+0x20/0x60 [ 108.468639] listxattr+0xd1/0xf0 [ 108.470303] path_listxattr+0xa1/0x100 [ 108.471946] ? listxattr+0xf0/0xf0 [ 108.473498] do_syscall_64+0x33/0x40 [ 108.475079] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 108.477447] RIP: 0033:0x7f1c8b904c8b [ 108.479147] Code: f0 ff ff 73 01 c3 48 8b 0d fa 21 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d cd 21 2c 00 f7 d8 64 89 01 48 [ 108.488040] RSP: 002b:00007fffef42b718 EFLAGS: 00000206 ORIG_RAX: 00000000000000c2 [ 108.491606] RAX: ffffffffffffffda RBX: 000000000000001e RCX: 00007f1c8b904c8b [ 108.495231] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000d8b440 [ 108.498424] RBP: 00000000000001f4 R08: 00007f1c8bbc7ba0 R09: 00007fffef42b367 [ 108.501888] R10: 0000000000000004 R11: 0000000000000206 R12: 000000000000001e [ 108.505872] R13: 0000000000403e30 R14: 0000000000000000 R15: 0000000000000000 [ 108.508933] [ 108.509616] Allocated by task 2999: [ 108.511161] kasan_save_stack+0x19/0x40 [ 108.513068] __kasan_kmalloc.constprop.8+0xc1/0xd0 [ 108.515313] _nfs42_proc_listxattrs+0x1b2/0x2f0 [nfsv4] [ 108.518041] nfs42_proc_listxattrs+0xf4/0x150 [nfsv4] [ 108.520832] nfs4_listxattr+0x34d/0x3d0 [nfsv4] [ 108.522954] listxattr+0xd1/0xf0 [ 108.524684] path_listxattr+0xa1/0x100 [ 108.526699] do_syscall_64+0x33/0x40 [ 108.528435] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 108.530892] [ 108.531671] The buggy address belongs to the object at ffff88801a16a000 [ 108.531671] which belongs to the cache kmalloc-192 of size 192 [ 108.537195] The buggy address is located 152 bytes inside of [ 108.537195] 192-byte region [ffff88801a16a000, ffff88801a16a0c0) [ 108.542548] The buggy address belongs to the page: [ 108.544756] page:000000006639f664 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1a168 [ 108.548725] head:000000006639f664 order:2 compound_mapcount:0 compound_pincount:0 [ 108.551925] flags: 0xfffffc0010200(slab|head) [ 108.553980] raw: 000fffffc0010200 ffffea0000111700 0000000800000008 ffff88800104f5c0 [ 108.557364] raw: 0000000000000000 0000000080400040 00000001ffffffff ffff88804ba9b001 [ 108.560996] page dumped because: kasan: bad access detected [ 108.564008] page->mem_cgroup:ffff88804ba9b001 [ 108.566736] [ 108.567607] Memory state around the buggy address: [ 108.569839] ffff88801a169f80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 108.573164] ffff88801a16a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 108.576348] >ffff88801a16a080: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc [ 108.579457] ^ [ 108.581212] ffff88801a16a100: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 108.584368] ffff88801a16a180: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 108.588425] ================================================================== > > > > This patch on top of all the fixes (getxattr + listxattr patches), > > seems ok but I'm not sure if it gets exercised. > >> > >> Reported-by: Olga Kornievskaia <kolga@xxxxxxxxxx> > >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > >> Cc: <stable@xxxxxxxxxxxxxxx> > >> --- > >> net/sunrpc/xdr.c | 1 + > >> net/sunrpc/xprtrdma/rpc_rdma.c | 41 +++++++++++++++++++++++++++++++--------- > >> 2 files changed, 33 insertions(+), 9 deletions(-) > >> > >> Changes since v3: > >> - I swear I am not drunk. I forgot to commit the change before mailing it. > >> > >> Changes since v2: > >> - Actually fix the xdr_buf_pagecount() problem > >> > >> Changes since RFC: > >> - Ensure xdr_buf_pagecount() is defined in rpc_rdma.c > >> - noinline the sparse page allocator -- it's an uncommon path > >> > >> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c > >> index 71e03b930b70..878f4c4fec1a 100644 > >> --- a/net/sunrpc/xdr.c > >> +++ b/net/sunrpc/xdr.c > >> @@ -141,6 +141,7 @@ xdr_buf_pagecount(struct xdr_buf *buf) > >> return 0; > >> return (buf->page_base + buf->page_len + PAGE_SIZE - 1) >> PAGE_SHIFT; > >> } > >> +EXPORT_SYMBOL_GPL(xdr_buf_pagecount); > >> > >> int > >> xdr_alloc_bvec(struct xdr_buf *buf, gfp_t gfp) > >> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c > >> index 0f5120c7668f..6c9a1810a70a 100644 > >> --- a/net/sunrpc/xprtrdma/rpc_rdma.c > >> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c > >> @@ -179,6 +179,32 @@ rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt, > >> r_xprt->rx_ep->re_max_inline_recv; > >> } > >> > >> +/* ACL likes to be lazy in allocating pages. For TCP, these > >> + * pages can be allocated during receive processing. Not true > >> + * for RDMA, which must always provision receive buffers > >> + * up front. > >> + */ > >> +static noinline int > >> +rpcrdma_alloc_sparse_pages(struct rpc_rqst *rqst) > >> +{ > >> + struct xdr_buf *xb = &rqst->rq_rcv_buf; > >> + struct page **ppages; > >> + int len; > >> + > >> + len = xb->page_len; > >> + ppages = xb->pages + xdr_buf_pagecount(xb); > >> + while (len > 0) { > >> + if (!*ppages) > >> + *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN); > >> + if (!*ppages) > >> + return -ENOBUFS; > >> + ppages++; > >> + len -= PAGE_SIZE; > >> + } > >> + > >> + return 0; > >> +} > >> + > >> /* Split @vec on page boundaries into SGEs. FMR registers pages, not > >> * a byte range. Other modes coalesce these SGEs into a single MR > >> * when they can. > >> @@ -233,15 +259,6 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf, > >> ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT); > >> page_base = offset_in_page(xdrbuf->page_base); > >> while (len) { > >> - /* ACL likes to be lazy in allocating pages - ACLs > >> - * are small by default but can get huge. > >> - */ > >> - if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) { > >> - if (!*ppages) > >> - *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN); > >> - if (!*ppages) > >> - return -ENOBUFS; > >> - } > >> seg->mr_page = *ppages; > >> seg->mr_offset = (char *)page_base; > >> seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len); > >> @@ -867,6 +884,12 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst) > >> __be32 *p; > >> int ret; > >> > >> + if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) { > >> + ret = rpcrdma_alloc_sparse_pages(rqst); > >> + if (ret) > >> + return ret; > >> + } > >> + > >> rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0); > >> xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf), > >> rqst); > > -- > Chuck Lever > > >