Re: [PATCH v8 15/21] mm: Change failure of MAP_FIXED to restoring the gap on failure

Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> · Tue, 3 Sep 2024 13:27:42 +0100

Hi Andrew - TL;DR of this is - please apply the fix patch attached below to
fix a problem in this series, thanks! :)

On Tue, Sep 03, 2024 at 12:00:04PM GMT, Lorenzo Stoakes wrote:
> On Tue, Sep 03, 2024 at 11:07:38AM GMT, Pengfei Xu wrote:
> > Hi Liam R. Howlett,
> >
> > Greetings!
> >
> > There is WARNING in __split_vma in next-20240902 in syzkaller fuzzing test.
> > Bisected and found first bad commit:
> > "
> > 3483c95414f9 mm: change failure of MAP_FIXED to restoring the gap on failure
> > "
> > It's same as below patch.
> > After reverted the above commit on top of next-20240902, this issue was gone.
> >
> > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/240903_092137___split_vma
> > Syzkaller repro code: https://github.com/xupengfe/syzkaller_logs/blob/main/240903_092137___split_vma/repro.c
> > Syzkaller repro syscall steps: https://github.com/xupengfe/syzkaller_logs/blob/main/240903_092137___split_vma/repro.prog
> > Syzkaller report: https://github.com/xupengfe/syzkaller_logs/blob/main/240903_092137___split_vma/repro.report
> > Kconfig(make olddefconfig): https://github.com/xupengfe/syzkaller_logs/blob/main/240903_092137___split_vma/kconfig_origin
> > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240903_092137___split_vma/bisect_info.log
> > bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240903_092137___split_vma/bzImage_ecc768a84f0b8e631986f9ade3118fa37852fef0.tar.gz
> > Issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240903_092137___split_vma/ecc768a84f0b8e631986f9ade3118fa37852fef0_dmesg.log
> >
> > And "KASAN: slab-use-after-free Read in acct_collect" also pointed to the
> > same commit, all detailed info:
> > https://github.com/xupengfe/syzkaller_logs/tree/main/240903_090000_acct_collect
> >
> > "
>
> Thanks for the report! Looking into it.
>
> > [   19.953726] cgroup: Unknown subsys name 'net'
> > [   20.045121] cgroup: Unknown subsys name 'rlimit'
> > [   20.138332] ------------[ cut here ]------------
> > [   20.138634] WARNING: CPU: 1 PID: 732 at include/linux/maple_tree.h:733 __split_vma+0x4d7/0x1020
> > [   20.139075] Modules linked in:
> > [   20.139245] CPU: 1 UID: 0 PID: 732 Comm: repro Not tainted 6.11.0-rc6-next-20240902-ecc768a84f0b #1
> > [   20.139779] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> > [   20.140337] RIP: 0010:__split_vma+0x4d7/0x1020
> > [   20.140572] Code: 89 ee 48 8b 40 10 48 89 c7 48 89 85 00 ff ff ff e8 8e 61 a7 ff 48 8b 85 00 ff ff ff 4c 39 e8 0f 83 ea fd ff ff e8 b9 5e a7 ff <0f> 0b e9 de fd ff ff 48 8b 85 30 ff ff ff 48 83 c0 10 48 89 85 18
> > [   20.141476] RSP: 0018:ffff8880217379a0 EFLAGS: 00010293
> > [   20.141749] RAX: 0000000000000000 RBX: ffff8880132351e0 RCX: ffffffff81bf6117
> > [   20.142106] RDX: ffff888012c30000 RSI: ffffffff81bf6187 RDI: 0000000000000006
> > [   20.142457] RBP: ffff888021737aa0 R08: 0000000000000001 R09: ffffed100263d3cd
> > [   20.142814] R10: 0000000020ff9000 R11: 0000000000000001 R12: ffff888021737e40
> > [   20.143173] R13: 0000000020ff9000 R14: 0000000020ffc000 R15: ffff888013235d20
> > [   20.143529] FS:  00007eff937f9740(0000) GS:ffff88806c500000(0000) knlGS:0000000000000000
> > [   20.144308] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   20.144600] CR2: 0000000020000040 CR3: 000000001f464003 CR4: 0000000000770ef0
> > [   20.144958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [   20.145313] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> > [   20.145665] PKRU: 55555554
> > [   20.145809] Call Trace:
> > [   20.145940]  <TASK>
> > [   20.146056]  ? show_regs+0x6d/0x80
> > [   20.146247]  ? __warn+0xf3/0x380
> > [   20.146431]  ? report_bug+0x25e/0x4b0
> > [   20.146650]  ? __split_vma+0x4d7/0x1020
>
> Have repro'd locally. This is, unsurprisingly, on this line (even if trace above
> doesn't decode to it unfortunately):
>
> 	vma_iter_config(vmi, new->vm_start, new->vm_end);
>
> The VMA in question spans 0x20ff9000, 0x21000000 so is 7 pages in size.
>
> At the point of invoking vma_iter_config(), the vma iterator points at
> 0x20ff9001, but we try to position it to 0x20ff9000.
>
> It seems the issue is that in do_vmi_munmap(), after vma_find() is called, we
> find a VMA at 0x20ff9000, but the VMI is positioned to 0x20ff9001...!
>
> Perhaps maple tree corruption in a previous call somehow?
>
>
> I can interestingly only repro this if I clear the qemu image each time, I'm
> guessing this is somehow tied to the instantiation of the cgroup setup or such?
>
> Am continuing the investigation.
>

[snip]

OK I turned on CONFIG_DEBUG_VM_MAPLE_TREE and am hitting
VM_WARN_ON_ONCE_MM(vma->vm_start != vmi_start, mm) after gather_failed is hit in
mmap_region() as a result of call_mmap() returning an error.

This is invoking kernfs_fop_mmap(), which returns -ENODEV because the
KERNFS_HAS_MMAP flag has not been set for the cgroup file being mapped.

This results in mmap_region() jumping to unmap_and_free_vma, which unmaps the
page tables in the region and goes on to abort the unmap operation.

The validate_mm() that fails is called by vms_complete_munmap_vmas() which was
invoked by vms_abort_munmap_vmas().

The tree is then corrupted:

vma ffff888013414d20 start 0000000020ff9000 end 0000000021000000 mm ffff88800d06ae40
prot 25 anon_vma ffff8880132cc660 vm_ops 0000000000000000
pgoff 20ff9 file 0000000000000000 private_data 0000000000000000
flags: 0x8100077(read|write|exec|mayread|maywrite|mayexec|account|softdirty)
tree range: ffff888013414d20 start 20ff9001 end 20ffffff

Incorrectly starting at off-by-one 0x20ff9001 rather than 0x20ff9000. This
is a very telling off-by... the programmer's favourite off-by-1 :) Which
then made me think of how mas operations have an _inclusive_ end and VMA
ones have an _exclusive_ one.

And so I tracked down the cause of this to vms_abort_munmap_vmas() which
was invoking mas_set_range() using vms->end (exclusive) as if it were
inclusive, which thus resulted in 0x20ff9000 being wrongly cleared.

Thes solution is simply to subtract this by one as done in the attached
fix-patch.

I confirmed this fixed the issue as I was able to set up a reliable repro
locally.

Thanks for the report! Great find.

----8<----