On 3 Mar 2025, at 11:46, David Hildenbrand wrote: > On 02.03.25 00:40, Hillf Danton wrote: >> On Sat, 01 Mar 2025 14:41:20 -0800 >>> Hello, >>> >>> syzbot found the following issue on: >>> >>> HEAD commit: e5d3fd687aac Add linux-next specific files for 20250218 >>> git tree: linux-next >>> console output: https://syzkaller.appspot.com/x/log.txt?x=12faf7f8580000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=4e945b2fe8e5992f >>> dashboard link: https://syzkaller.appspot.com/bug?extid=fb86166504f57eff29d7 >>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 >>> >>> Unfortunately, I don't have any reproducer for this issue yet. >>> >>> Downloadable assets: >>> disk image: https://storage.googleapis.com/syzbot-assets/ef079ccd2725/disk-e5d3fd68.raw.xz >>> vmlinux: https://storage.googleapis.com/syzbot-assets/99f2123d6831/vmlinux-e5d3fd68.xz >>> kernel image: https://storage.googleapis.com/syzbot-assets/eadfc9520358/bzImage-e5d3fd68.xz >>> >>> IMPORTANT: if you fix the issue, please add the following tag to the commit: >>> Reported-by: syzbot+fb86166504f57eff29d7@xxxxxxxxxxxxxxxxxxxxxxxxx >>> >>> evict+0x4e8/0x9a0 fs/inode.c:806 >>> __dentry_kill+0x20d/0x630 fs/dcache.c:660 >>> dput+0x19f/0x2b0 fs/dcache.c:902 >>> __fput+0x60b/0x9f0 fs/file_table.c:472 >>> task_work_run+0x24f/0x310 kernel/task_work.c:227 >>> resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] >>> exit_to_user_mode_loop kernel/entry/common.c:114 [inline] >>> exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline] >>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] >>> syscall_exit_to_user_mode+0x13f/0x340 kernel/entry/common.c:218 >>> do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 >>> entry_SYSCALL_64_after_hwframe+0x77/0x7f >>> ------------[ cut here ]------------ >>> kernel BUG at mm/rmap.c:1858! >>> Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI >>> CPU: 1 UID: 0 PID: 6053 Comm: syz.4.27 Not tainted 6.14.0-rc3-next-20250218-syzkaller #0 >>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 >>> RIP: 0010:try_to_unmap_one+0x3d0d/0x3fa0 mm/rmap.c:1858 >>> Code: c7 c7 80 93 c3 8e 48 89 da e8 ef f3 19 03 e9 68 ca ff ff e8 b5 12 ab ff 48 8b 7c 24 20 48 c7 c6 80 17 36 8c e8 94 d2 f5 ff 90 <0f> 0b e8 9c 12 ab ff 48 8b 7c 24 18 48 c7 c6 40 1c 36 8c e8 7b d2 >>> RSP: 0018:ffffc9000b1be9c0 EFLAGS: 00010246 >>> RAX: 367eb4645686ad00 RBX: 00000000f4000000 RCX: ffffc9000b1be503 >>> RDX: 0000000000000004 RSI: ffffffff8c2aaf60 RDI: ffffffff8c8156e0 >>> RBP: ffffc9000b1bedf0 R08: ffffffff903da477 R09: 1ffffffff207b48e >>> R10: dffffc0000000000 R11: fffffbfff207b48f R12: 8000000053c008e7 >>> R13: dffffc0000000000 R14: ffffea00014f0000 R15: ffffea00014f0030 >>> FS: 00007f4d2783e6c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 >>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 000000110c465fa1 CR3: 000000002a1f6000 CR4: 00000000003526f0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Call Trace: >>> <TASK> >>> __rmap_walk_file+0x420/0x5f0 mm/rmap.c:2774 >>> try_to_unmap+0x219/0x2e0 >>> unmap_folio+0x183/0x1f0 mm/huge_memory.c:3053 >>> __folio_split+0x849/0x16d0 mm/huge_memory.c:3696 >>> truncate_inode_partial_folio+0x9b1/0xdc0 mm/truncate.c:234 >>> shmem_undo_range+0x82f/0x1820 mm/shmem.c:1143 >> >> Given folio_test_hugetlb(folio) [1], what is weird is hugetlb page in a >> shmem mapping. >> > > Right, the problem begins when we call __folio_split() on a hugetlb folio, and the issue is that we seem to find that in the pagecache. > > I wonder if there is some weird interaction with out recent folio split changes in next. Maybe, for some reason, we end up adding a wrong folio to the pagecache during a split (truncation), and a follow-up split (truncation) finds the wrong folio. > > Just a guess, though. CCing Zi Yan. You are right. I have a fix: https://lore.kernel.org/linux-mm/56EBE3B6-99EA-470E-B2B3-92C9C13032DF@xxxxxxxxxx/ I should have verified folio2 after it is locked and before the second split. Best Regards, Yan, Zi