Re: kernel BUG at mm/memory.c:LINE!

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Tue, 10 Jul 2018 12:02:17 +0200

On Tue, Jul 10, 2018 at 12:07 AM, Kirill A. Shutemov
<kirill@xxxxxxxxxxxxx> wrote:
> On Mon, Jul 09, 2018 at 07:23:15PM +0200, Dmitry Vyukov wrote:
>> On Mon, Jul 9, 2018 at 5:25 PM, Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:
>> > On Mon, Jul 09, 2018 at 05:21:55PM +0300, Kirill A. Shutemov wrote:
>> >> > This also happened only once so far:
>> >> > https://syzkaller.appspot.com/bug?extid=3f84280d52be9b7083cc
>> >> > and I can't reproduce it rerunning this program. So it's either a very
>> >> > subtle race, or fd in the middle of netlink address magically matched
>> >> > some fd once, or something else...
>> >>
>> >> Okay, I've got it reproduced. See below.
>> >>
>> >> The problem is that kcov doesn't set vm_ops for the VMA and it makes
>> >> kernel think that the VMA is anonymous.
>> >>
>> >> It's not necessary the way it was triggered by syzkaller. I just found
>> >> that kcov's ->mmap doesn't set vm_ops. There can more such cases.
>> >> vma_is_anonymous() is what we need to fix.
>> >>
>> >> ( Although, I found logic around mmaping the file second time questinable
>> >>   at best. It seems broken to me. )
>> >>
>> >> It is known that vma_is_anonymous() can produce false-positives. It tried
>> >> to fix it once[1], but it back-fired[2].
>> >>
>> >> I'll look at this again.
>> >
>> > Below is a patch that seems work. But it definately requires more testing.
>> >
>> > Dmitry, could you give it a try in syzkaller?
>>
>> Trying.
>>
>> Not sure what you expect from this. Either way it will be hundreds of
>> crashes before vs hundreds of crashes after ;)
>>
>> But one that started popping up is this, looks like it's somewhere
>> around the code your patch touches:
>>
>> kasan: CONFIG_KASAN_INLINE enabled
>> kasan: GPF could be caused by NULL-ptr deref or user memory access
>> general protection fault: 0000 [#1] SMP KASAN
>> CPU: 0 PID: 6711 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #43
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>> RIP: 0010:__get_vma_policy+0x61/0x160 mm/mempolicy.c:1620
>
> Right, my bad. Here's fixup.
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index d508c7844681..12b2b3c7f51e 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -597,6 +597,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
>         memset(&pseudo_vma, 0, sizeof(struct vm_area_struct));
>         pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
>         pseudo_vma.vm_file = file;
> +       pseudo_vma.vm_ops = &anon_vm_ops;
>
>         for (index = start; index < end; index++) {
>                 /*

With this change I don't see anything that stands out, just a typical
mix of crashes like these:

BUG: unable to handle kernel paging request in kfree
INFO: task hung in flush_work
KASAN: slab-out-of-bounds Read in fscache_alloc_cookie
KASAN: use-after-free Read in __queue_work
general protection fault in encode_rpcb_string
lost connection to test machine
no output from test machine
unregister_netdevice: waiting for DEV to become free

So I guess this can be qualified as +1 for the patch.