Re: [PATCH] ceph: only dirty ITER_IOVEC pages for direct read

Luis Henriques <lhenriques@xxxxxxxx> · Fri, 30 Mar 2018 15:46:39 +0100

Ilya Dryomov <idryomov@xxxxxxxxx> writes:

> On Fri, Mar 30, 2018 at 12:24 PM, Luis Henriques <lhenriques@xxxxxxxx> wrote:
>> (CC'ing Jeff)
>>
>> Ilya Dryomov <idryomov@xxxxxxxxx> writes:
>>
>>> On Fri, Mar 30, 2018 at 9:26 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>>>>
>>>>
>>>>> On 30 Mar 2018, at 15:20, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>>>>
>>>>> On Fri, Mar 30, 2018 at 5:21 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>>>>>>
>>>>>>
>>>>>>> On 30 Mar 2018, at 00:49, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> On Fri, Mar 16, 2018 at 4:32 AM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>>>>>>>
>>>>>>> Hi Zheng,
>>>>>>>
>>>>>>> Can you explain how this fixes the invalid memory access reported in
>>>>>>> "KASAN: use-after-free Read in set_page_dirty_lock"?  Did you come up
>>>>>>> with a different reproducer?
>>>>>>>
>>>>>>
>>>>>> I don\u2019t know why KASAN reports use-after-free (like false alarm) in
>>>>>> this case. I compared the code with other filesystem, found recent commits
>>>>>> make other filesystems only dirty ITER_IOVEC pages. The change should also
>>>>>> make KASAN silent.
>>>>>
>>>>> Did you manage to reproduce that KASAN report?
>>>>>
>>>> no
>>>>
>>>>> If it is a false positive, the patch shouldn't have been marked for
>>>>> stable.
>>>>>
>>>>
>>>> I CCed stable because fuse did the same
>>>>
>>>> commit 8fba54aebbdf1f999738121922e74bf796ad60ee
>>>> Author: Miklos Szeredi <mszeredi@xxxxxxxxxx>
>>>> Date:   Wed Aug 24 18:17:04 2016 +0200
>>>>
>>>>     fuse: direct-io: don't dirty ITER_BVEC pages
>>>>
>>>>     When reading from a loop device backed by a fuse file it deadlocks on
>>>>     lock_page().
>>>>
>>>>     This is because the page is already locked by the read() operation done on
>>>>     the loop device.  In this case we don't want to either lock the page or
>>>>     dirty it.
>>>>
>>>>     So do what fs/direct-io.c does: only dirty the page for ITER_IOVEC vectors.
>>>
>>> OK, so it is a potential deadlock which has nothing to do with that
>>> KASAN report.  I have rewritten the changelog to reflect that, please
>>> take a look:
>>>
>>>   https://github.com/ceph/ceph-client/commit/85784f9395987a422fa04263e7c0fb13da11eb5c
>>
>> Just as a side note, it's still trivial to get a cephfs-related KASAN
>> report by running a fuzzer (trinity in my case) against a mainline
>> kernel 4.16.0-rc7 with this fix backported.
>
> Right.  This patch fixes a deadlock which is completely unrelated.  The
> report which I was referring to is probably a by product of ceph splice
> brokenness in mainline and not a false positive.
>
>>
>> As Jeff mentioned in a different thread, splice syscall is broken on
>> cephfs and the fix for it is still tagged as "DO NOT MERGE" in the
>> ceph-client testing branch.  I still think there's some bug in Jeff's
>
> Yes, that's why I asked Zheng about a different reproducer.  The
> auto-generated one is based on splice, but it didn't trigger anything
> for me on testing (i.e. with Jeff's splice patches).

What I've been using is simply to run trinity with:

 # trinity -V /some/dir/in/cephfs -c splice

With Jeff's patches it may take a while until you get a splat.  However,
I just realised that I may be wrong in my analysis and Jeff's patches
are _probably_ correct.  I've been paying too much attention to the
KASAN report (the one I'm currently looking is "slab-out-of-bounds in
iov_iter_get_pages_alloc").  But what's interesting is what follows it:

[ 2204.022588] iov_iter_get_pages_alloc: array overrun (ffff88006bb63bf0 > ffff88006bb63b88 + 12)
[ 2204.024546] WARNING: CPU: 1 PID: 276 at lib/iov_iter.c:1297 iov_iter_get_pages_alloc+0x1257/0x1290
...

(the line number is likely incorrect, but it's the only WARN_ONCE() in
lib/iov_iter.c).

So, it looks like the bvec is being corrupted somewhere else and I
wonder if that could be something similar to what ext4 fixed with commit
e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO").  It is
possible that the cephfs client also needs to serialize calls to
ceph_direct_read_write(), but I'm still looking and trying to understand
if this could be the problem.

>> fix as I still see a crash occasionally, but I haven't yet finished
>> debugging it.  Unfortunately, Jeff's fix is probably not appropriate for
>> stable kernels (but may I'm wrong).
>
> Need to get it in mainline in some form first...

Yes, of course.  Let's see if Jeff has any ideas on this and maybe he
can get those patches ready for 4.17.

Cheers,
-- 
Luis
>
> Thanks,
>
>                 Ilya
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html