Re: [PATCH] ceph: only dirty ITER_IOVEC pages for direct read

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 2, 2018 at 1:10 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> On Sun, 2018-04-01 at 18:33 +0200, Ilya Dryomov wrote:
>> On Fri, Mar 30, 2018 at 9:14 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>> > On Fri, 2018-03-30 at 11:24 +0100, Luis Henriques wrote:
>> > > (CC'ing Jeff)
>> > >
>> > > Ilya Dryomov <idryomov@xxxxxxxxx> writes:
>> > >
>> > > > On Fri, Mar 30, 2018 at 9:26 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>> > > > >
>> > > > >
>> > > > > > On 30 Mar 2018, at 15:20, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>> > > > > >
>> > > > > > On Fri, Mar 30, 2018 at 5:21 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>> > > > > > >
>> > > > > > >
>> > > > > > > > On 30 Mar 2018, at 00:49, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>> > > > > > > >
>> > > > > > > > On Fri, Mar 16, 2018 at 4:32 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>> > > > > > > >
>> > > > > > > > Hi Zheng,
>> > > > > > > >
>> > > > > > > > Can you explain how this fixes the invalid memory access reported in
>> > > > > > > > "KASAN: use-after-free Read in set_page_dirty_lock"?  Did you come up
>> > > > > > > > with a different reproducer?
>> > > > > > > >
>> > > > > > >
>> > > > > > > I don\u2019t know why KASAN reports use-after-free (like false alarm) in
>> > > > > > > this case. I compared the code with other filesystem, found recent commits
>> > > > > > > make other filesystems only dirty ITER_IOVEC pages. The change should also
>> > > > > > > make KASAN silent.
>> > > > > >
>> > > > > > Did you manage to reproduce that KASAN report?
>> > > > > >
>> > > > >
>> > > > > no
>> > > > >
>> > > > > > If it is a false positive, the patch shouldn't have been marked for
>> > > > > > stable.
>> > > > > >
>> > > > >
>> > > > > I CCed stable because fuse did the same
>> > > > >
>> > > > > commit 8fba54aebbdf1f999738121922e74bf796ad60ee
>> > > > > Author: Miklos Szeredi <mszeredi@xxxxxxxxxx>
>> > > > > Date:   Wed Aug 24 18:17:04 2016 +0200
>> > > > >
>> > > > >     fuse: direct-io: don't dirty ITER_BVEC pages
>> > > > >
>> > > > >     When reading from a loop device backed by a fuse file it deadlocks on
>> > > > >     lock_page().
>> > > > >
>> > > > >     This is because the page is already locked by the read() operation done on
>> > > > >     the loop device.  In this case we don't want to either lock the page or
>> > > > >     dirty it.
>> > > > >
>> > > > >     So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors.
>> > > >
>> > > > OK, so it is a potential deadlock which has nothing to do with that
>> > > > KASAN report.  I have rewritten the changelog to reflect that, please
>> > > > take a look:
>> > > >
>> > > >   https://github.com/ceph/ceph-client/commit/85784f9395987a422fa04263e7c0fb13da11eb5c
>> > >
>> > > Just as a side note, it's still trivial to get a cephfs-related KASAN
>> > > report by running a fuzzer (trinity in my case) against a mainline
>> > > kernel 4.16.0-rc7 with this fix backported.
>> > >
>> >
>> > I suspect trinity is clobbering the array
>> > > As Jeff mentioned in a different thread, splice syscall is broken on
>> > > cephfs and the fix for it is still tagged as "DO NOT MERGE" in the
>> > > ceph-client testing branch.  I still think there's some bug in Jeff's
>> > > fix as I still see a crash occasionally, but I haven't yet finished
>> > > debugging it.  Unfortunately, Jeff's fix is probably not appropriate for
>> > > stable kernels (but may I'm wrong).
>> > >
>> > > Cheers,
>> >
>> > (cc'ing Al)
>> >
>> > Yeah, I should have revisited this a while ago. So the deal is that I
>> > posted this patch upstream around a year ago, but Al didn't really like
>> > it:
>> >
>> >     https://patchwork.kernel.org/patch/9541829/
>> >
>> > He wanted to add some iov_iter_for_each_page infrastructure and base
>> > the new function on that. I had thought he had something queued up
>> > along those lines at one point, but I don't see anything in mainline so
>> > far.
>> >
>> > There is a iov_iter_for_each_range, but it doesn't handle userland
>> > iovecs and doesn't seem to care about alignment. I think we'll need
>> > both here.
>> >
>> > I'm not sure however that a per-page callback is really that helpful
>> > for callers like ceph_direct_read_write. We'll just end up having to do
>> > more  work in the fs to batch up the pages before we add the op to the
>> > req.
>> >
>> > iov_iter_get_pages_alloc is really sort of what we want in this case,
>> > and we want that function to stitch together longer arrays when it can.
>> >
>> > Al, do you have any thoughts on what we should do here?
>>
>> I think this can be fixed entirely in ceph by walking away from page
>> vectors (barely tested):
>>
>>   https://github.com/ceph/ceph-client/commits/wip-splice
>>
>> If folks would like to see something like iov_iter_get_bvecs{,_alloc}()
>> in iov_iter.c, I can work on generalizing my helper (i.e. add maxbvecs
>> and maxsize parameters -- I didn't need maxbvecs and maxsize is just
>> "make a copy of the iterator and truncate it").  Otherwise we will
>> probably merge it into ceph, as this code has remained broken for way
>> too long...
>>
>
> I think we should try to keep as much of the gory details in
> lib/iov_iter.c as we can, so I'd support lifting most of
> iter_get_bvecs_alloc into there.
>
>> Attached is the main patch from wip-splice.
>>
>> Thanks,
>>
>>                 Ilya
>
>
> +       while (iov_iter_count(i)) {
> +               struct page *pages[16];
> +               ssize_t bytes;
> +               size_t start;
> +               int idx = 0;
> +
> +               bytes = iov_iter_get_pages(i, pages, SIZE_MAX, 16, &start);
> +               if (bytes < 0)
> +                       return total ?: bytes;
> +
> +               for ( ; bytes; idx++, bvec_idx++) {
> +                       struct bio_vec bv = {
> +                               .bv_page = pages[idx],
> +                               .bv_len = min_t(int, bytes, PAGE_SIZE - start),
> +                               .bv_offset = start,
> +                       };
> +
> +                       bvecs[bvec_idx] = bv;
> +                       iov_iter_advance(i, bv.bv_len);
> +                       total += bv.bv_len;
> +                       bytes -= bv.bv_len;
> +                       start = 0;
> +               }
> +       }
> +
> +       return total;
>
> It's possible that we'll end up with a set of contiguous bvecs that do
> not begin or end on page boundaries. Will the lower-level ceph OSD
> client code handle that correctly? If so, then this should be ok. If
> not, then we need to be more careful when we're stitching together
> iovecs like this and check the alignment as we go.

Yes, it should be handled correctly, just like a bio with an arbitrary
set of bvecs in it.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux