On Tue, Feb 18, 2025 at 2:09 AM Alan Huang <mmpgouride@xxxxxxxxx> wrote: > > On Feb 18, 2025, at 01:12, Kairui Song <ryncsn@xxxxxxxxx> wrote: > > > > On Mon, Feb 17, 2025 at 12:13 AM Kairui Song <ryncsn@xxxxxxxxx> wrote: > >> > >> On Sat, Feb 15, 2025 at 7:24 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > >>> > >>> On Fri, 14 Feb 2025 10:11:19 -0800 syzbot <syzbot+38a0cbd267eff2d286ff@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > >>> > >>>> syzbot has found a reproducer for the following issue on: > >>> > >>> Thanks. I doubt if bcachefs is implicated in this? > >>> > >>>> HEAD commit: 128c8f96eb86 Merge tag 'drm-fixes-2025-02-14' of https://g.. > >>>> git tree: upstream > >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=148019a4580000 > >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=c776e555cfbdb82d > >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=38a0cbd267eff2d286ff > >>>> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12328bf8580000 > >>>> > >>>> Downloadable assets: > >>>> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7feb34a89c2a/non_bootable_disk-128c8f96.raw.xz > >>>> vmlinux: https://storage.googleapis.com/syzbot-assets/a97f78ac821e/vmlinux-128c8f96.xz > >>>> kernel image: https://storage.googleapis.com/syzbot-assets/f451cf16fc9f/bzImage-128c8f96.xz > >>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/a7da783f97cf/mount_3.gz > >>>> > >>>> IMPORTANT: if you fix the issue, please add the following tag to the commit: > >>>> Reported-by: syzbot+38a0cbd267eff2d286ff@xxxxxxxxxxxxxxxxxxxxxxxxx > >>>> > >>>> ------------[ cut here ]------------ > >>>> WARNING: CPU: 0 PID: 5459 at mm/list_lru.c:96 lock_list_lru_of_memcg+0x39e/0x4d0 mm/list_lru.c:96 > >>> > >>> VM_WARN_ON(!css_is_dying(&memcg->css)); > >> > >> I'm checking this, when last time this was triggered, it was caused by > >> a list_lru user did not initialize the memcg list_lru properly before > >> list_lru reclaim started, and fixed by: > >> https://lore.kernel.org/all/20241222122936.67501-1-ryncsn@xxxxxxxxx/T/ > >> > >> This shouldn't be a big issue, maybe there are leaks that will be > >> fixed upon reparenting, and this new added sanity check might be too > >> lenient, I'm not 100% sure though. > >> > >> Unfortunately I couldn't reproduce the issue locally with the > >> reproducer yet. will keep the test running and see if it can hit this > >> WARN_ON. > > > > So far I am still unable to trigger this VM_WARN_ON using the > > reproducer, and I'm seeing many other random crashes. > > > > But after I changed the .config a bit adding more debug configs > > (SLAB_FREELIST_HARDENED, DEBUG_PAGEALLOC), following crash showed up > > and will be triggered immediately after I start the test: > > > > [ T1242] BUG: unable to handle page fault for address: ffff888054c60000 > > [ T1242] #PF: supervisor read access in kernel mode > > [ T1242] #PF: error_code(0x0000) - not-present page > > [ T1242] PGD 19e01067 P4D 19e01067 PUD 19e04067 PMD 7fc5c067 PTE > > 800fffffab39f060 > > [ T1242] Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI > > [ T1242] CPU: 1 UID: 0 PID: 1242 Comm: kworker/1:1H Not tainted > > 6.14.0-rc2-00185-g128c8f96eb86 #2 > > [ T1242] Hardware name: Red Hat KVM/RHEL-AV, BIOS > > 1.16.0-4.module+el8.8.0+664+0a3d6c83 04/01/2014 > > [ T1242] Workqueue: bcachefs_btree_read_complete btree_node_read_work > > [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0 > > [ T6058] bcachefs (loop2): empty btree root xattrs > > [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6 > > 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89 > > ff <f3> 48 a5 48 8b bc 24 c8 00 00 08 > > [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206 > > [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31 > > [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90 > > [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035 > > [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e > > [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0 > > [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000) > > knlGS:0000000000000000 > > [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0 > > [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ T1242] Call Trace: > > [ T1242] <TASK> > > [ T1242] bch2_btree_node_read_done+0x1d20/0x53a0 > > [ T1242] btree_node_read_work+0x54d/0xdc0 > > [ T1242] process_scheduled_works+0xaf8/0x17f0 > > [ T1242] worker_thread+0x89d/0xd60 > > [ T1242] kthread+0x722/0x890 > > [ T1242] ret_from_fork+0x4e/0x80 > > [ T1242] ret_from_fork_asm+0x1a/0x30 > > [ T1242] </TASK> > > [ T1242] Modules linked in: > > [ T1242] ---[ end trace 0000000000000000 ]--- > > [ T1242] RIP: 0010:validate_bset_keys+0xae3/0x14f0 > > [ T1242] Code: 49 39 df 0f 87 fc 09 00 00 e8 79 54 a8 fd 41 0f b7 c6 > > 48 8b 4c 24 68 48 8d 04 c1 4c 29 f8 48 c1 e8 03 89 c1 48 89 de 4c 89 > > ff <f3> 48 a5 48 8b bc 24 c8 00 00 08 > > [ T1242] RSP: 0018:ffffc900070a72c0 EFLAGS: 00010206 > > [ T1242] RAX: 000000000000ec0f RBX: ffff888054c20110 RCX: 0000000000006c31 > > [ T1242] RDX: 0000000000000000 RSI: ffff888054c60000 RDI: ffff888054c5ff90 > > [ T1242] RBP: ffffc900070a7570 R08: ffff888065e001af R09: 1ffff1100cbc0035 > > [ T1242] R10: dffffc0000000000 R11: ffffed100cbc0036 R12: ffff888054c2009e > > [ T1242] R13: dffffc0000000000 R14: 000000000000ec0f R15: ffff888054c200a0 > > [ T1242] FS: 0000000000000000(0000) GS:ffff88807ea00000(0000) > > knlGS:0000000000000000 > > [ T1242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ T1242] CR2: ffff888054c60000 CR3: 000000006cea6000 CR4: 00000000000006f0 > > [ T1242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ T1242] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ T1242] Kernel panic - not syncing: Fatal exception > > [ T1242] Kernel Offset: disabled > > [ T1242] Rebooting in 86400 seconds.. > > > > It's caused by the memmove_u64s_down in validate_bset_keys of > > fs/bcachefs/btree_io.c: > > -> memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k); > > > Might need this. > > diff --git a/fs/bcachefs/btree_io.c b/fs/bcachefs/btree_io.c > index e71b278672b6..fb53174cb735 100644 > --- a/fs/bcachefs/btree_io.c > +++ b/fs/bcachefs/btree_io.c > @@ -997,7 +997,7 @@ static int validate_bset_keys(struct bch_fs *c, struct btree *b, > } > got_good_key: > le16_add_cpu(&i->u64s, -next_good_key); > - memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) k); > + memmove_u64s_down(k, bkey_p_next(k), (u64 *) vstruct_end(i) - (u64 *) bkey_p_next(k)); > set_btree_node_need_rewrite(b); > } > fsck_err: > Thanks, but this didn't fix everything. I think the problem is more complex, syzbot seems to be trying to mount damaged bcachefs (on purpose I think), so the vstruct_end(i) is already returning an offset that is out of border. I retriggered it and print some more debug info: i->_data is ffff88806d5c00a0, i->u64s is 60928, and the faulting address is ffff88806d600000.