On Thu, Aug 03, 2023 at 04:56:03PM +0800, Yikebaer Aizezi wrote: > console output: > https://drive.google.com/file/d/1Lq71bFwtEDix82PEf_193CLG6uh1Pjj9/view?usp=drive_link I dug through this, and what I found troubles me. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 13067 at mm/gup.c:229 try_grab_page+0x2dd/0x3a0 Modules linked in: CPU: 0 PID: 13067 Comm: syz-executor Tainted: G B 6.5.0-rc2 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:try_grab_page+0x2dd/0x3a0 Code: ff be 04 00 00 00 4c 89 e7 e8 cf fa 13 00 f0 41 ff 04 24 e8 65 96 cb ff 45 31 e4 5b 44 89 e0 5d 41 5c 41 5d c3 e8 53 96 cb ff <0f> 0b e8 4c 96 cb ff 41 bc f4 ff ff ff 5b 44 89 e0 5d 41 5c 41 5d RSP: 0018:ffffc9000c2777e0 EFLAGS: 00010212 RAX: 0000000000000247 RBX: ffffea00003ae340 RCX: ffffc90002bb1000 RDX: 0000000000040000 RSI: ffffffff81ad81ed RDI: ffffea00003ae374 RBP: ffffea00003ae340 R08: 0000000000000000 R09: fffff94000075c6e R10: ffffea00003ae377 R11: 0000000000084001 R12: ffffea00003ae374 R13: 0000000000210002 R14: ffffea00003ae340 R15: 000000000eb8d225 FS: 00007f5841a13640(0000) GS:ffff888063e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000500310 CR3: 0000000018d0c000 CR4: 0000000000750ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0xe2/0x340 ? try_grab_page+0x2dd/0x3a0 ? report_bug+0x25d/0x460 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x40 ? asm_exc_invalid_op+0x16/0x20 ? try_grab_page+0x2dd/0x3a0 ? try_grab_page+0x2dd/0x3a0 follow_page_pte+0x18c/0x1610 ? try_grab_page+0x3a0/0x3a0 ? rcu_is_watching+0xe/0xb0 follow_page_mask+0x2e4/0xbd0 __get_user_pages+0x3fa/0xcf0 ? follow_page_mask+0xbd0/0xbd0 ? down_read_killable+0x146/0x4f0 ? down_read_interruptible+0x4f0/0x4f0 ? rcu_is_watching+0xe/0xb0 __gup_longterm_locked+0x5fa/0x1ec0 ? io_schedule_timeout+0x150/0x150 ? rcu_is_watching+0xe/0xb0 ? get_user_pages_unlocked+0x580/0x580 ? lock_release+0x4f7/0x670 ? internal_get_user_pages_fast+0xe27/0x2690 ? lock_downgrade+0x690/0x690 ? preempt_schedule_common+0x45/0xb0 ? pud_huge+0x9c/0xe0 ? pmd_huge+0xe0/0xe0 internal_get_user_pages_fast+0x119b/0x2690 ? mtree_load+0x1df/0x980 ? __gup_device_huge+0x530/0x530 ? rcu_is_watching+0xe/0xb0 ? lock_release+0x4f7/0x670 get_user_pages_fast+0x95/0xe0 ? get_user_pages_fast_only+0xe0/0xe0 do_get_mempolicy+0x50c/0xd20 ? sp_delete+0xf0/0xf0 ? seccomp_notify_ioctl+0xd80/0xd80 __x64_sys_get_mempolicy+0x187/0x2a0 ? __ia32_sys_migrate_pages+0xf0/0xf0 ? __secure_computing+0x1ff/0x360 do_syscall_64+0x35/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x47959d Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b4 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f5841a13068 EFLAGS: 00000246 ORIG_RAX: 00000000000000ef RAX: ffffffffffffffda RBX: 000000000059c0a0 RCX: 000000000047959d RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 000000000059c0a0 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000020ff9000 R11: 0000000000000246 R12: 000000000059c0ac R13: 000000000000000b R14: 0000000000437250 R15: 00007f58419f3000 </TASK> Kernel panic - not syncing: kernel: panic_on_warn set ... > WARNING: CPU: 0 PID: 13067 at mm/gup.c:229 try_grab_page+0x2dd/0x3a0 That's this line: if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) Called from: follow_page_pte+0x18c/0x1610 That did: ptep = pte_offset_map_lock(mm, pmd, address, &ptl); pte = ptep_get(ptep); page = vm_normal_page(vma, address, pte); ret = try_grab_page(page, flags); So we grabbed the PTE lock, looked up the PTE, translated that into a page ... and found a page with a zero (or negative) refcount. That's Really Bad. I think it was a zero refcount because r08 is 0 and I don't see any other registers which have a plausible negative 32-bit number in them. Yikebaer, could I trouble you to add this: +++ b/mm/gup.c @@ -226,7 +226,7 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) { struct folio *folio = page_folio(page); - if (WARN_ON_ONCE(folio_ref_count(folio) <= 0)) + if (VM_WARN_ON_ONCE_FOLIO(folio_ref_count(folio) <= 0, folio)) return -ENOMEM; if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) and rerun the syzkaller? That'll give us some more information about what has happened, although it won't tell us why it happened. We might need to catch someone decrementing the refcount to lower than the mapcount to catch this ... which will be tricky, given the other things we reuse the mapcount for.