On Sun, Apr 01, 2018 at 06:33:31PM +0200, Ilya Dryomov wrote: > On Fri, Mar 30, 2018 at 9:14 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Fri, 2018-03-30 at 11:24 +0100, Luis Henriques wrote: > >> (CC'ing Jeff) > >> > >> Ilya Dryomov <idryomov@xxxxxxxxx> writes: > >> > >> > On Fri, Mar 30, 2018 at 9:26 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: > >> > > > >> > > > >> > > > On 30 Mar 2018, at 15:20, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > >> > > > > >> > > > On Fri, Mar 30, 2018 at 5:21 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: > >> > > > > > >> > > > > > >> > > > > > On 30 Mar 2018, at 00:49, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > >> > > > > > > >> > > > > > On Fri, Mar 16, 2018 at 4:32 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: > >> > > > > > > >> > > > > > Hi Zheng, > >> > > > > > > >> > > > > > Can you explain how this fixes the invalid memory access reported in > >> > > > > > "KASAN: use-after-free Read in set_page_dirty_lock"? Did you come up > >> > > > > > with a different reproducer? > >> > > > > > > >> > > > > > >> > > > > I don\u2019t know why KASAN reports use-after-free (like false alarm) in > >> > > > > this case. I compared the code with other filesystem, found recent commits > >> > > > > make other filesystems only dirty ITER_IOVEC pages. The change should also > >> > > > > make KASAN silent. > >> > > > > >> > > > Did you manage to reproduce that KASAN report? > >> > > > > >> > > > >> > > no > >> > > > >> > > > If it is a false positive, the patch shouldn't have been marked for > >> > > > stable. > >> > > > > >> > > > >> > > I CCed stable because fuse did the same > >> > > > >> > > commit 8fba54aebbdf1f999738121922e74bf796ad60ee > >> > > Author: Miklos Szeredi <mszeredi@xxxxxxxxxx> > >> > > Date: Wed Aug 24 18:17:04 2016 +0200 > >> > > > >> > > fuse: direct-io: don't dirty ITER_BVEC pages > >> > > > >> > > When reading from a loop device backed by a fuse file it deadlocks on > >> > > lock_page(). > >> > > > >> > > This is because the page is already locked by the read() operation done on > >> > > the loop device. In this case we don't want to either lock the page or > >> > > dirty it. > >> > > > >> > > So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors. > >> > > >> > OK, so it is a potential deadlock which has nothing to do with that > >> > KASAN report. I have rewritten the changelog to reflect that, please > >> > take a look: > >> > > >> > https://github.com/ceph/ceph-client/commit/85784f9395987a422fa04263e7c0fb13da11eb5c > >> > >> Just as a side note, it's still trivial to get a cephfs-related KASAN > >> report by running a fuzzer (trinity in my case) against a mainline > >> kernel 4.16.0-rc7 with this fix backported. > >> > > > > I suspect trinity is clobbering the array > >> As Jeff mentioned in a different thread, splice syscall is broken on > >> cephfs and the fix for it is still tagged as "DO NOT MERGE" in the > >> ceph-client testing branch. I still think there's some bug in Jeff's > >> fix as I still see a crash occasionally, but I haven't yet finished > >> debugging it. Unfortunately, Jeff's fix is probably not appropriate for > >> stable kernels (but may I'm wrong). > >> > >> Cheers, > > > > (cc'ing Al) > > > > Yeah, I should have revisited this a while ago. So the deal is that I > > posted this patch upstream around a year ago, but Al didn't really like > > it: > > > > https://patchwork.kernel.org/patch/9541829/ > > > > He wanted to add some iov_iter_for_each_page infrastructure and base > > the new function on that. I had thought he had something queued up > > along those lines at one point, but I don't see anything in mainline so > > far. > > > > There is a iov_iter_for_each_range, but it doesn't handle userland > > iovecs and doesn't seem to care about alignment. I think we'll need > > both here. > > > > I'm not sure however that a per-page callback is really that helpful > > for callers like ceph_direct_read_write. We'll just end up having to do > > more work in the fs to batch up the pages before we add the op to the > > req. > > > > iov_iter_get_pages_alloc is really sort of what we want in this case, > > and we want that function to stitch together longer arrays when it can. > > > > Al, do you have any thoughts on what we should do here? > > I think this can be fixed entirely in ceph by walking away from page > vectors (barely tested): > > https://github.com/ceph/ceph-client/commits/wip-splice > > If folks would like to see something like iov_iter_get_bvecs{,_alloc}() > in iov_iter.c, I can work on generalizing my helper (i.e. add maxbvecs > and maxsize parameters -- I didn't need maxbvecs and maxsize is just > "make a copy of the iterator and truncate it"). Otherwise we will > probably merge it into ceph, as this code has remained broken for way > too long... > > Attached is the main patch from wip-splice. > > Thanks, > > Ilya I haven't (yet) spent much time looking at the code, but a quick test shows me the below failure in the sanity() tests from lib/iov_iter.c. Cheers, -- Luís [ 235.134588] idx = 7, offset = 4096 [ 235.134917] curbuf = 8, nrbufs = 1, buffers = 16 [ 235.135388] [ (null) 00000000a5376b82 0 4096] [ 235.135914] [ (null) 00000000fb52416a 0 4096] [ 235.136421] [ (null) 0000000009c7c0fc 0 4096] [ 235.136901] [ (null) 000000008b9911f0 0 4096] [ 235.137411] [ (null) 0000000042e4a3e2 0 4096] [ 235.137879] [ (null) 00000000110f4fc8 0 4096] [ 235.138358] [ (null) 00000000e16bdd11 0 4096] [ 235.138910] [ (null) 000000007d61a8f6 0 4096] [ 235.139399] [00000000af9d005b 000000003b50ae93 0 4096] [ 235.139879] [ (null) 00000000c55231a4 0 4096] [ 235.140378] [ (null) 00000000eba243d2 0 4096] [ 235.140851] [ (null) 00000000f5617952 0 4096] [ 235.141344] [ (null) 00000000fec2c691 0 4096] [ 235.141836] [ (null) 00000000ca09f9a4 0 4096] [ 235.142360] [ (null) 0000000087cfbb92 0 4096] [ 235.142886] [ (null) 000000009432c839 0 4096] [ 235.143441] WARNING: CPU: 1 PID: 261 at lib/iov_iter.c:350 sanity+0x162/0x210 [ 235.144208] CPU: 1 PID: 261 Comm: trinity-c5 Not tainted 4.16.0-rc7+ #52 [ 235.144942] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 [ 235.146057] RIP: 0010:sanity+0x162/0x210 [ 235.146509] RSP: 0018:ffff8800656970e8 EFLAGS: 00010246 [ 235.147078] RAX: 0000000000000000 RBX: ffff88006852c868 RCX: ffffffff81589cb3 [ 235.147841] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff88006852c8a8 [ 235.148636] RBP: 0000000000000010 R08: fffffbfff059aa08 R09: ffff88006852c8e0 [ 235.149470] R10: 0000000000000003 R11: fffffbfff059aa07 R12: ffff88006406dd38 [ 235.150340] R13: ffff88006852c8a8 R14: 0000000000000000 R15: 0000000000001000 [ 235.151184] FS: 00007fd75163eb40(0000) GS:ffff88006d700000(0000) knlGS:0000000000000000 [ 235.152028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 235.152562] CR2: 00007fd751501000 CR3: 000000006a940000 CR4: 00000000000006a0 [ 235.153232] DR0: 00007fd7514fe000 DR1: 0000000000000000 DR2: 0000000000000000 [ 235.153887] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000f0602 [ 235.154644] Call Trace: [ 235.154914] iov_iter_get_pages+0x2a3/0x680 [ 235.155377] ? iov_iter_gap_alignment+0x490/0x490 [ 235.155882] ? iov_iter_advance+0x360/0x780 [ 235.156334] ? generic_file_splice_read+0x21b/0x320 [ 235.156993] ? SyS_splice+0x914/0x9a0 [ 235.157498] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 235.158177] ? iov_iter_copy_from_user_atomic+0x6e0/0x6e0 [ 235.158957] ? kmemleak_disable+0x70/0x70 [ 235.159456] __iter_get_bvecs+0x117/0x290 [ 235.159955] ? ceph_fallocate+0xc10/0xc10 [ 235.160455] ? iov_iter_npages+0x309/0x540 [ 235.160974] ? __kmalloc+0xf9/0x1f0 [ 235.161410] ceph_direct_read_write+0x4e7/0x1380 [ 235.161981] ? ceph_aio_complete_req+0x750/0x750 [ 235.162503] ? ceph_write_iter+0x103b/0x1530 [ 235.162986] ? __rb_erase_color+0xd50/0xd50 [ 235.163449] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 235.164043] ? ceph_fscache_revalidate_cookie+0x80/0x1d0 [ 235.164630] ? ceph_fscache_unregister_fs+0x1f0/0x1f0 [ 235.165201] ? __ceph_mark_dirty_caps+0x212/0x610 [ 235.165713] ? ceph_get_caps+0x2bc/0x700 [ 235.166147] ? _raw_spin_unlock_bh+0xc0/0xc0 [ 235.166681] ? ceph_put_cap_refs+0x5a0/0x5a0 [ 235.167182] ? __delete_object+0xcd/0x100 [ 235.167636] ? put_object+0x40/0x40 [ 235.168033] ? up_read+0x20/0x20 [ 235.168396] ? ceph_write_iter+0x103b/0x1530 [ 235.168865] ? kmem_cache_free+0x80/0x1d0 [ 235.169252] ? ceph_write_iter+0x1047/0x1530 [ 235.169692] ? ceph_read_iter+0xc3e/0x10b0 [ 235.170097] ceph_read_iter+0xc3e/0x10b0 [ 235.170519] ? __tty_buffer_request_room+0xd0/0x320 [ 235.171138] ? ceph_write_iter+0x1530/0x1530 [ 235.171667] ? timerqueue_add+0xd2/0x100 [ 235.172239] ? enqueue_hrtimer+0xfb/0x1d0 [ 235.172789] ? hrtimer_reprogram+0x130/0x130 [ 235.173361] ? _raw_spin_unlock_bh+0xc0/0xc0 [ 235.173913] ? _raw_spin_unlock_bh+0xc0/0xc0 [ 235.174464] ? hrtimer_start_range_ns+0x32d/0x440 [ 235.174979] ? hrtimer_forward+0x120/0x120 [ 235.175394] ? hrtimer_try_to_cancel+0x83/0x250 [ 235.176033] ? hrtimer_active+0x240/0x240 [ 235.176630] ? __hrtimer_get_remaining+0xc6/0x100 [ 235.177285] ? hrtimer_start_range_ns+0x440/0x440 [ 235.177980] ? generic_file_splice_read+0x21b/0x320 [ 235.178694] ? ceph_write_iter+0x1530/0x1530 [ 235.179325] generic_file_splice_read+0x21b/0x320 [ 235.180011] ? splice_shrink_spd+0x40/0x40 [ 235.180619] ? schedule+0x50/0x280 [ 235.181126] ? rw_verify_area+0x1f/0x70 [ 235.181641] ? do_splice_to+0x6a/0xc0 [ 235.182075] SyS_splice+0x914/0x9a0 [ 235.182405] ? __task_pid_nr_ns+0x1a9/0x1c0 [ 235.182825] ? compat_SyS_vmsplice+0x100/0x100 [ 235.183266] ? do_syscall_64+0x8d/0x300 [ 235.183701] ? compat_SyS_vmsplice+0x100/0x100 [ 235.184223] do_syscall_64+0x164/0x300 [ 235.184662] ? syscall_return_slowpath+0x1c0/0x1c0 [ 235.185213] ? page_fault+0x25/0x50 [ 235.185616] ? syscall_return_slowpath+0x13c/0x1c0 [ 235.186124] ? prepare_exit_to_usermode+0xdb/0x140 [ 235.186692] ? syscall_trace_enter+0x320/0x320 [ 235.187206] ? __put_user_4+0x1c/0x30 [ 235.187858] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 235.188726] RIP: 0033:0x7fd750f61229 [ 235.189312] RSP: 002b:00007fff1ad745c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000113 [ 235.190568] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd750f61229 [ 235.191687] RDX: 000000000000003d RSI: 0000000000000000 RDI: 0000000000000098 [ 235.192598] RBP: 00007fd751614000 R08: 0000000000fffff7 R09: 0000000000000001 [ 235.193528] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000113 [ 235.194443] R13: 00007fd751614048 R14: 00007fd75163ead8 R15: 00007fd751614000 [ 235.195364] Code: 4c 89 ff e8 f1 43 cf ff 8b 73 3c 44 89 e1 48 c7 c7 80 13 11 82 89 ea e8 87 8d b8 ff 4c 89 ef e8 d5 43 cf ff 8b 43 40 85 c0 75 13 <0f> 0b 48 83 c4 10 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 4c 8d [ 235.197769] ---[ end trace 0228543143d962c4 ]--- -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html