Re: [PATCH] ceph: only dirty ITER_IOVEC pages for direct read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2018-04-01 at 18:33 +0200, Ilya Dryomov wrote:
> On Fri, Mar 30, 2018 at 9:14 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > On Fri, 2018-03-30 at 11:24 +0100, Luis Henriques wrote:
> > > (CC'ing Jeff)
> > > 
> > > Ilya Dryomov <idryomov@xxxxxxxxx> writes:
> > > 
> > > > On Fri, Mar 30, 2018 at 9:26 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
> > > > > 
> > > > > 
> > > > > > On 30 Mar 2018, at 15:20, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > > > > > 
> > > > > > On Fri, Mar 30, 2018 at 5:21 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > > On 30 Mar 2018, at 00:49, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > > > > > > > 
> > > > > > > > On Fri, Mar 16, 2018 at 4:32 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
> > > > > > > > 
> > > > > > > > Hi Zheng,
> > > > > > > > 
> > > > > > > > Can you explain how this fixes the invalid memory access reported in
> > > > > > > > "KASAN: use-after-free Read in set_page_dirty_lock"?  Did you come up
> > > > > > > > with a different reproducer?
> > > > > > > > 
> > > > > > > 
> > > > > > > I don\u2019t know why KASAN reports use-after-free (like false alarm) in
> > > > > > > this case. I compared the code with other filesystem, found recent commits
> > > > > > > make other filesystems only dirty ITER_IOVEC pages. The change should also
> > > > > > > make KASAN silent.
> > > > > > 
> > > > > > Did you manage to reproduce that KASAN report?
> > > > > > 
> > > > > 
> > > > > no
> > > > > 
> > > > > > If it is a false positive, the patch shouldn't have been marked for
> > > > > > stable.
> > > > > > 
> > > > > 
> > > > > I CCed stable because fuse did the same
> > > > > 
> > > > > commit 8fba54aebbdf1f999738121922e74bf796ad60ee
> > > > > Author: Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > > > Date:   Wed Aug 24 18:17:04 2016 +0200
> > > > > 
> > > > >     fuse: direct-io: don't dirty ITER_BVEC pages
> > > > > 
> > > > >     When reading from a loop device backed by a fuse file it deadlocks on
> > > > >     lock_page().
> > > > > 
> > > > >     This is because the page is already locked by the read() operation done on
> > > > >     the loop device.  In this case we don't want to either lock the page or
> > > > >     dirty it.
> > > > > 
> > > > >     So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors.
> > > > 
> > > > OK, so it is a potential deadlock which has nothing to do with that
> > > > KASAN report.  I have rewritten the changelog to reflect that, please
> > > > take a look:
> > > > 
> > > >   https://github.com/ceph/ceph-client/commit/85784f9395987a422fa04263e7c0fb13da11eb5c
> > > 
> > > Just as a side note, it's still trivial to get a cephfs-related KASAN
> > > report by running a fuzzer (trinity in my case) against a mainline
> > > kernel 4.16.0-rc7 with this fix backported.
> > > 
> > 
> > I suspect trinity is clobbering the array
> > > As Jeff mentioned in a different thread, splice syscall is broken on
> > > cephfs and the fix for it is still tagged as "DO NOT MERGE" in the
> > > ceph-client testing branch.  I still think there's some bug in Jeff's
> > > fix as I still see a crash occasionally, but I haven't yet finished
> > > debugging it.  Unfortunately, Jeff's fix is probably not appropriate for
> > > stable kernels (but may I'm wrong).
> > > 
> > > Cheers,
> > 
> > (cc'ing Al)
> > 
> > Yeah, I should have revisited this a while ago. So the deal is that I
> > posted this patch upstream around a year ago, but Al didn't really like
> > it:
> > 
> >     https://patchwork.kernel.org/patch/9541829/
> > 
> > He wanted to add some iov_iter_for_each_page infrastructure and base
> > the new function on that. I had thought he had something queued up
> > along those lines at one point, but I don't see anything in mainline so
> > far.
> > 
> > There is a iov_iter_for_each_range, but it doesn't handle userland
> > iovecs and doesn't seem to care about alignment. I think we'll need
> > both here.
> > 
> > I'm not sure however that a per-page callback is really that helpful
> > for callers like ceph_direct_read_write. We'll just end up having to do
> > more  work in the fs to batch up the pages before we add the op to the
> > req.
> > 
> > iov_iter_get_pages_alloc is really sort of what we want in this case,
> > and we want that function to stitch together longer arrays when it can.
> > 
> > Al, do you have any thoughts on what we should do here?
> 
> I think this can be fixed entirely in ceph by walking away from page
> vectors (barely tested):
> 
>   https://github.com/ceph/ceph-client/commits/wip-splice
> 
> If folks would like to see something like iov_iter_get_bvecs{,_alloc}()
> in iov_iter.c, I can work on generalizing my helper (i.e. add maxbvecs
> and maxsize parameters -- I didn't need maxbvecs and maxsize is just
> "make a copy of the iterator and truncate it").  Otherwise we will
> probably merge it into ceph, as this code has remained broken for way
> too long...
> 

I think we should try to keep as much of the gory details in
lib/iov_iter.c as we can, so I'd support lifting most of
iter_get_bvecs_alloc into there.

> Attached is the main patch from wip-splice.
> 
> Thanks,
> 
>                 Ilya


+	while (iov_iter_count(i)) {
+		struct page *pages[16];
+		ssize_t bytes;
+		size_t start;
+		int idx = 0;
+
+		bytes = iov_iter_get_pages(i, pages, SIZE_MAX, 16, &start);
+		if (bytes < 0)
+			return total ?: bytes;
+
+		for ( ; bytes; idx++, bvec_idx++) {
+			struct bio_vec bv = {
+				.bv_page = pages[idx],
+				.bv_len = min_t(int, bytes, PAGE_SIZE - start),
+				.bv_offset = start,
+			};
+
+			bvecs[bvec_idx] = bv;
+			iov_iter_advance(i, bv.bv_len);
+			total += bv.bv_len;
+			bytes -= bv.bv_len;
+			start = 0;
+		}
+	}
+
+	return total;

It's possible that we'll end up with a set of contiguous bvecs that do
not begin or end on page boundaries. Will the lower-level ceph OSD
client code handle that correctly? If so, then this should be ok. If
not, then we need to be more careful when we're stitching together
iovecs like this and check the alignment as we go.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux