On Mon, Apr 2, 2018 at 1:10 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Sun, 2018-04-01 at 18:33 +0200, Ilya Dryomov wrote: >> On Fri, Mar 30, 2018 at 9:14 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >> > On Fri, 2018-03-30 at 11:24 +0100, Luis Henriques wrote: >> > > (CC'ing Jeff) >> > > >> > > Ilya Dryomov <idryomov@xxxxxxxxx> writes: >> > > >> > > > On Fri, Mar 30, 2018 at 9:26 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: >> > > > > >> > > > > >> > > > > > On 30 Mar 2018, at 15:20, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> > > > > > >> > > > > > On Fri, Mar 30, 2018 at 5:21 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: >> > > > > > > >> > > > > > > >> > > > > > > > On 30 Mar 2018, at 00:49, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> > > > > > > > >> > > > > > > > On Fri, Mar 16, 2018 at 4:32 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote: >> > > > > > > > >> > > > > > > > Hi Zheng, >> > > > > > > > >> > > > > > > > Can you explain how this fixes the invalid memory access reported in >> > > > > > > > "KASAN: use-after-free Read in set_page_dirty_lock"? Did you come up >> > > > > > > > with a different reproducer? >> > > > > > > > >> > > > > > > >> > > > > > > I don\u2019t know why KASAN reports use-after-free (like false alarm) in >> > > > > > > this case. I compared the code with other filesystem, found recent commits >> > > > > > > make other filesystems only dirty ITER_IOVEC pages. The change should also >> > > > > > > make KASAN silent. >> > > > > > >> > > > > > Did you manage to reproduce that KASAN report? >> > > > > > >> > > > > >> > > > > no >> > > > > >> > > > > > If it is a false positive, the patch shouldn't have been marked for >> > > > > > stable. >> > > > > > >> > > > > >> > > > > I CCed stable because fuse did the same >> > > > > >> > > > > commit 8fba54aebbdf1f999738121922e74bf796ad60ee >> > > > > Author: Miklos Szeredi <mszeredi@xxxxxxxxxx> >> > > > > Date: Wed Aug 24 18:17:04 2016 +0200 >> > > > > >> > > > > fuse: direct-io: don't dirty ITER_BVEC pages >> > > > > >> > > > > When reading from a loop device backed by a fuse file it deadlocks on >> > > > > lock_page(). >> > > > > >> > > > > This is because the page is already locked by the read() operation done on >> > > > > the loop device. In this case we don't want to either lock the page or >> > > > > dirty it. >> > > > > >> > > > > So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors. >> > > > >> > > > OK, so it is a potential deadlock which has nothing to do with that >> > > > KASAN report. I have rewritten the changelog to reflect that, please >> > > > take a look: >> > > > >> > > > https://github.com/ceph/ceph-client/commit/85784f9395987a422fa04263e7c0fb13da11eb5c >> > > >> > > Just as a side note, it's still trivial to get a cephfs-related KASAN >> > > report by running a fuzzer (trinity in my case) against a mainline >> > > kernel 4.16.0-rc7 with this fix backported. >> > > >> > >> > I suspect trinity is clobbering the array >> > > As Jeff mentioned in a different thread, splice syscall is broken on >> > > cephfs and the fix for it is still tagged as "DO NOT MERGE" in the >> > > ceph-client testing branch. I still think there's some bug in Jeff's >> > > fix as I still see a crash occasionally, but I haven't yet finished >> > > debugging it. Unfortunately, Jeff's fix is probably not appropriate for >> > > stable kernels (but may I'm wrong). >> > > >> > > Cheers, >> > >> > (cc'ing Al) >> > >> > Yeah, I should have revisited this a while ago. So the deal is that I >> > posted this patch upstream around a year ago, but Al didn't really like >> > it: >> > >> > https://patchwork.kernel.org/patch/9541829/ >> > >> > He wanted to add some iov_iter_for_each_page infrastructure and base >> > the new function on that. I had thought he had something queued up >> > along those lines at one point, but I don't see anything in mainline so >> > far. >> > >> > There is a iov_iter_for_each_range, but it doesn't handle userland >> > iovecs and doesn't seem to care about alignment. I think we'll need >> > both here. >> > >> > I'm not sure however that a per-page callback is really that helpful >> > for callers like ceph_direct_read_write. We'll just end up having to do >> > more work in the fs to batch up the pages before we add the op to the >> > req. >> > >> > iov_iter_get_pages_alloc is really sort of what we want in this case, >> > and we want that function to stitch together longer arrays when it can. >> > >> > Al, do you have any thoughts on what we should do here? >> >> I think this can be fixed entirely in ceph by walking away from page >> vectors (barely tested): >> >> https://github.com/ceph/ceph-client/commits/wip-splice >> >> If folks would like to see something like iov_iter_get_bvecs{,_alloc}() >> in iov_iter.c, I can work on generalizing my helper (i.e. add maxbvecs >> and maxsize parameters -- I didn't need maxbvecs and maxsize is just >> "make a copy of the iterator and truncate it"). Otherwise we will >> probably merge it into ceph, as this code has remained broken for way >> too long... >> > > I think we should try to keep as much of the gory details in > lib/iov_iter.c as we can, so I'd support lifting most of > iter_get_bvecs_alloc into there. > >> Attached is the main patch from wip-splice. >> >> Thanks, >> >> Ilya > > > + while (iov_iter_count(i)) { > + struct page *pages[16]; > + ssize_t bytes; > + size_t start; > + int idx = 0; > + > + bytes = iov_iter_get_pages(i, pages, SIZE_MAX, 16, &start); > + if (bytes < 0) > + return total ?: bytes; > + > + for ( ; bytes; idx++, bvec_idx++) { > + struct bio_vec bv = { > + .bv_page = pages[idx], > + .bv_len = min_t(int, bytes, PAGE_SIZE - start), > + .bv_offset = start, > + }; > + > + bvecs[bvec_idx] = bv; > + iov_iter_advance(i, bv.bv_len); > + total += bv.bv_len; > + bytes -= bv.bv_len; > + start = 0; > + } > + } > + > + return total; > > It's possible that we'll end up with a set of contiguous bvecs that do > not begin or end on page boundaries. Will the lower-level ceph OSD > client code handle that correctly? If so, then this should be ok. If > not, then we need to be more careful when we're stitching together > iovecs like this and check the alignment as we go. Yes, it should be handled correctly, just like a bio with an arbitrary set of bvecs in it. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html