Hi Darrick J. Wong, On 2023-03-21 at 13:46:38 -0700, Darrick J. Wong wrote: > On Mon, Mar 20, 2023 at 02:50:07PM +0800, Pengfei Xu wrote: > > Hi Dave Chinner and xfs experts, > > > > Greeting! > > > > There is BUG: unable to handle kernel NULL pointer dereference in > > xfs_filestream_select_ag in v6.3-rc3: > > > > All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230319_210525_xfs_filestream_select_ag > > Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.c > > How the hell am I supposed to extract the fuzzed disk image for > analysis? > > Current Google syzbot provides a lot more information for analysis. Why > don't you go triage some of their reports instead of spraying more crap > at the XFS list? > Ah, thanks a lot for your suggestion! Next time I should add more analysis as follow from syzkaller to all problem reports. Updated more info as follow, More detailed analysis from syzkaller report0: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/report0 repor.stats: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.stats vm machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/machineInfo0 I newly added repro.report: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/repro.report " 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x2c0" at daddr 0x8001 len 1 error 74 XFS (loop0): page discard on page 00000000b8174cbd, inode 0x46, pos 0. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.3.0-rc2-intel-next-38f821ff82e9+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:arch_atomic_inc arch/x86/include/asm/atomic.h:95 [inline] RIP: 0010:atomic_inc include/linux/atomic/atomic-instrumented.h:191 [inline] RIP: 0010:xfs_filestream_create_association fs/xfs/xfs_filestream.c:321 [inline] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xce0 fs/xfs/xfs_filestream.c:372 Code: 80 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 80 f9 03 00 48 89 c3 48 85 c0 0f 84 3a 05 00 00 e8 9f 8a 80 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45 RSP: 0018:ffffc900001274c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88800dbeae40 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88800791a340 RDI: 0000000000000002 RBP: ffffc90000127548 R08: ffffc90000127400 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc90000127588 R14: 0000000000000001 R15: ffffc90000127708 FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000000b85c002 CR4: 0000000000f70ee0 PKRU: 55555554 Call Trace: <TASK> xfs_bmap_btalloc_filestreams fs/xfs/libxfs/xfs_bmap.c:3558 [inline] xfs_bmap_btalloc+0x706/0xb90 fs/xfs/libxfs/xfs_bmap.c:3672 xfs_bmap_alloc_userdata fs/xfs/libxfs/xfs_bmap.c:4046 [inline] xfs_bmapi_allocate+0x25b/0x5e0 fs/xfs/libxfs/xfs_bmap.c:4089 xfs_bmapi_convert_delalloc+0x335/0x6c0 fs/xfs/libxfs/xfs_bmap.c:4554 xfs_convert_blocks fs/xfs/xfs_aops.c:266 [inline] xfs_map_blocks+0x2ff/0x8a0 fs/xfs/xfs_aops.c:389 iomap_writepage_map fs/iomap/buffered-io.c:1641 [inline] iomap_do_writepage+0x43f/0x1070 fs/iomap/buffered-io.c:1803 write_cache_pages+0x2b8/0x8a0 mm/page-writeback.c:2473 iomap_writepages+0x3e/0x80 fs/iomap/buffered-io.c:1820 xfs_vm_writepages+0x97/0xe0 fs/xfs/xfs_aops.c:513 do_writepages+0x10f/0x240 mm/page-writeback.c:2551 __writeback_single_inode+0x9f/0xb20 fs/fs-writeback.c:1600 writeback_sb_inodes+0x301/0x8b0 fs/fs-writeback.c:1891 wb_writeback+0x18b/0x7c0 fs/fs-writeback.c:2065 wb_do_writeback fs/fs-writeback.c:2208 [inline] wb_workfn+0xc0/0xad0 fs/fs-writeback.c:2248 process_one_work+0x3b1/0x9e0 kernel/workqueue.c:2390 worker_thread+0x52/0x660 kernel/workqueue.c:2537 kthread+0x161/0x1a0 kernel/kthread.c:376 ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308 </TASK> Modules linked in: CR2: 0000000000000010 ---[ end trace 0000000000000000 ]--- RIP: 0010:arch_atomic_inc arch/x86/include/asm/atomic.h:95 [inline] RIP: 0010:atomic_inc include/linux/atomic/atomic-instrumented.h:191 [inline] RIP: 0010:xfs_filestream_create_association fs/xfs/xfs_filestream.c:321 [inline] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xce0 fs/xfs/xfs_filestream.c:372 Code: 80 ff 49 89 5d 18 be 08 00 00 00 bf 20 00 00 00 e8 80 f9 03 00 48 89 c3 48 85 c0 0f 84 3a 05 00 00 e8 9f 8a 80 ff 49 8b 45 18 <f0> ff 40 10 49 8b 45 18 48 8b 75 b8 48 89 da 48 89 43 18 48 8b 45 RSP: 0018:ffffc900001274c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88800dbeae40 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88800791a340 RDI: 0000000000000002 RBP: ffffc90000127548 R08: ffffc90000127400 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc90000127588 R14: 0000000000000001 R15: ffffc90000127708 FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000000b85c002 CR4: 0000000000f70ee0 PKRU: 55555554 note: kworker/u4:2[34] exited with irqs disabled ------------[ cut here ]------------ WARNING: CPU: 1 PID: 34 at kernel/exit.c:814 do_exit+0xf68/0x1360 kernel/exit.c:814 Modules linked in: CPU: 1 PID: 34 Comm: kworker/u4:2 Tainted: G D 6.3.0-rc2-intel-next-38f821ff82e9+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:do_exit+0xf68/0x1360 kernel/exit.c:814 Code: ff ff e8 2b 7e 1b 00 4c 89 ee bf 05 06 00 00 e8 7e c1 01 00 e9 a7 f2 ff ff e8 14 7e 1b 00 0f 0b e9 f8 f0 ff ff e8 08 7e 1b 00 <0f> 0b e9 60 f1 ff ff e8 fc 7d 1b 00 48 89 df e8 54 ff 1a 00 e9 ec RSP: 0018:ffffc90000127eb0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88800791a340 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffff88800791a340 RDI: 0000000000000002 RBP: ffffc90000127f18 R08: 0000000000000000 R09: 0000000000000000 R10: 34752f72656b726f R11: 776b203a65746f6e R12: 0000000000000000 R13: 0000000000000009 R14: ffff8880079292c0 R15: ffff888007924600 FS: 0000000000000000(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000000b85c002 CR4: 0000000000f70ee0 PKRU: 55555554 Call Trace: <TASK> make_task_dead+0x100/0x290 kernel/exit.c:981 rewind_stack_and_make_dead+0x17/0x20 arch/x86/entry/entry_64.S:1541 </TASK> irq event stamp: 46556 hardirqs last enabled at (46555): [<ffffffff8218402d>] get_random_u32+0x1dd/0x360 drivers/char/random.c:532 hardirqs last disabled at (46556): [<ffffffff8300582e>] exc_page_fault+0x4e/0x500 arch/x86/mm/fault.c:1551 softirqs last enabled at (37844): [<ffffffff83029bdc>] softirq_handle_end kernel/softirq.c:414 [inline] softirqs last enabled at (37844): [<ffffffff83029bdc>] __do_softirq+0x31c/0x49c kernel/softirq.c:600 softirqs last disabled at (37835): [<ffffffff8112e774>] invoke_softirq kernel/softirq.c:445 [inline] softirqs last disabled at (37835): [<ffffffff8112e774>] __irq_exit_rcu kernel/softirq.c:650 [inline] softirqs last disabled at (37835): [<ffffffff8112e774>] irq_exit_rcu+0xc4/0x100 kernel/softirq.c:662 ---[ end trace 0000000000000000 ]--- ---------------- Code disassembly (best guess): 0: 80 ff 49 cmp $0x49,%bh 3: 89 5d 18 mov %ebx,0x18(%rbp) 6: be 08 00 00 00 mov $0x8,%esi b: bf 20 00 00 00 mov $0x20,%edi 10: e8 80 f9 03 00 call 0x3f995 15: 48 89 c3 mov %rax,%rbx 18: 48 85 c0 test %rax,%rax 1b: 0f 84 3a 05 00 00 je 0x55b 21: e8 9f 8a 80 ff call 0xff808ac5 26: 49 8b 45 18 mov 0x18(%r13),%rax * 2a: f0 ff 40 10 lock incl 0x10(%rax) <-- trapping instruction 2e: 49 8b 45 18 mov 0x18(%r13),%rax 32: 48 8b 75 b8 mov -0x48(%rbp),%rsi 36: 48 89 da mov %rbx,%rdx 39: 48 89 43 18 mov %rax,0x18(%rbx) 3d: 48 rex.W 3e: 8b .byte 0x8b 3f: 45 rex.RB " > > Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/kconfig_origin > > v6.3-rc3 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/v6.3-rc3_issue_dmesg.log > > Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230319_210525_xfs_filestream_select_ag/bisect_info.log > > > > Bisected between v6.3-rc2 and v5.11 and found the bad commit: > > " > > 8ac5b996bf5199f15b7687ceae989f8b2a410dda > > xfs: fix off-by-one-block in xfs_discard_folio() > > How does *fixing* an off by one error in the page cache produce a crash > in the filestreams allocator? > I'm also surprised there is such a problem, I'm not sure the reason as I'm not a little about xfs. > > Reverted the commit on top of v6.3-rc2 kernel, at least the BUG dmesg was gone. > > > > And this issue could be reproduced in v6.3-rc3 kernel also. > > Is it possible that the above commit involves a new issue? > > > > " > > [ 62.318653] loop0: detected capacity change from 0 to 65536 > > [ 62.320459] XFS (loop0): Mounting V5 Filesystem d6f69dbd-8c5d-46be-b88e-92c0ae88ceb2 > > [ 62.325152] XFS (loop0): Ending clean mount > > [ 62.326049] XFS (loop0): Quotacheck needed: Please wait. > > [ 62.328884] XFS (loop0): Quotacheck: Done. > > [ 62.363656] XFS (loop0): Metadata CRC error detected at xfs_agf_read_verify+0x10e/0x140, xfs_agf block 0x8001 > > [ 62.364489] XFS (loop0): Unmount and run xfs_repair > > [ 62.364881] XFS (loop0): First 128 bytes of corrupted metadata buffer: > > [ 62.365398] 00000000: 58 41 47 46 00 00 00 01 00 00 00 01 00 00 40 00 XAGF..........@. > > [ 62.366026] 00000010: 00 00 00 02 00 00 00 03 00 00 00 00 00 00 00 01 ................ > > [ 62.366657] 00000020: 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 04 ................ > > [ 62.367285] 00000030: 00 00 00 04 00 00 3b 5f 00 00 3b 5c 00 00 00 00 ......;_..;\.... > > [ 62.367927] 00000040: d6 f6 9d bd 8c 5d 46 be b8 8e 92 c0 ae 88 ce b2 .....]F......... > > [ 62.368554] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > [ 62.369180] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > [ 62.369806] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > [ 62.370471] XFS (loop0): metadata I/O error in "xfs_read_agf+0xd0/0x200" at daddr 0x8001 len 1 error 74 > > [ 62.371312] XFS (loop0): page discard on page 00000000a6a1237b, inode 0x46, pos 0. > > [ 62.385968] BUG: kernel NULL pointer dereference, address: 0000000000000010 > > [ 62.386541] #PF: supervisor write access in kernel mode > > [ 62.386960] #PF: error_code(0x0002) - not-present page > > [ 62.387370] PGD 0 P4D 0 > > [ 62.387588] Oops: 0002 [#1] PREEMPT SMP NOPTI > > [ 62.387945] CPU: 1 PID: 74 Comm: kworker/u4:3 Not tainted 6.3.0-rc3-kvm-e8d018dd #1 > > [ 62.388545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 > > [ 62.389426] Workqueue: writeback wb_workfn (flush-7:0) > > [ 62.389845] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xac0 > > What source line and/or instruction does %rip point to? > Considering that this is a null pointer deference, you ought to be able > to identify which pointer access did this. > > If you are going to run some scripted tool to randomly corrupt the > filesystem to find failures, then you have an ethical and moral > responsibility to do some of the work to narrow down and identify the > cause of the failure, not just throw them at someone to do all the work. > You are right, sorry, I should provide RIP and all other detailed info I have next time. Below info is from above repro.report: " BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.3.0-rc2-intel-next-38f821ff82e9+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: writeback wb_workfn (flush-7:0) RIP: 0010:arch_atomic_inc arch/x86/include/asm/atomic.h:95 [inline] RIP: 0010:atomic_inc include/linux/atomic/atomic-instrumented.h:191 [inline] RIP: 0010:xfs_filestream_create_association fs/xfs/xfs_filestream.c:321 [inline] RIP: 0010:xfs_filestream_select_ag+0x5d5/0xce0 fs/xfs/xfs_filestream.c:372 " Thanks! BR. -Pengfei > --D >