Re: [Syzkaller & bisect] There is "xfs_btree_lookup_get_block" general protection BUG in v6.3-rc3 kernel

Pengfei Xu <pengfei.xu@xxxxxxxxx> · Wed, 22 Mar 2023 11:36:40 +0800

Updated more info as follow:
vm machine info: https://github.com/xupengfe/syzkaller_logs/blob/main/230321_082343_xfs_btree_lookup_get_block/machineInfo0
More RIP and best guess info in report0 from syzkaller: https://github.com/xupengfe/syzkaller_logs/blob/main/230321_082343_xfs_btree_lookup_get_block/report0
repro.report: https://github.com/xupengfe/syzkaller_logs/blob/main/230321_082343_xfs_btree_lookup_get_block/repro.report
repro.stas: https://github.com/xupengfe/syzkaller_logs/blob/main/230321_082343_xfs_btree_lookup_get_block/repro.stats

Thanks for any suggestion if I missed something info.

Thanks!

On 2023-03-21 at 14:16:29 +0800, Pengfei Xu wrote:
> Hi Darrick J. Wong and xfs experts,
> 
> Greeting!
> 
> Platform: x86 platforms
> There is "xfs_btree_lookup_get_block" general protection BUG in v6.3-rc3 kernel.
> 
> All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230321_082343_xfs_btree_lookup_get_block
> Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230321_082343_xfs_btree_lookup_get_block/repro.c
> Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230321_082343_xfs_btree_lookup_get_block/kconfig_origin
> v6.3-rc3 issue log: https://github.com/xupengfe/syzkaller_logs/blob/main/230321_082343_xfs_btree_lookup_get_block/e8d018dd0257f744ca50a729e3d042cf2ec9da65_dmesg.log
> Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230321_082343_xfs_btree_lookup_get_block/bisect_info.log
> 
> Bisected and found the bad commit:
> "
> 7993f1a431bc5271369d359941485a9340658ac3
> xfs: only run COW extent recovery when there are no live extents
> "
> It's just suspected commit, because reverted the above commit on top of
> v6.3-rc3 and made kernel failed, could not double confirm for this issue's
> verification with reverted kernel.
> 
> "
> [   29.020016] XFS (loop3): Error -5 reserving per-AG metadata reserve pool.
> [   29.022919] BUG: kernel NULL pointer dereference, address: 000000000000022b
> [   29.023777] #PF: supervisor read access in kernel mode
> [   29.024413] #PF: error_code(0x0000) - not-present page
> [   29.025081] PGD 12947067 P4D 12947067 PUD 12976067 PMD 0
> [   29.025825] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [   29.026465] CPU: 0 PID: 544 Comm: repro Not tainted 6.3.0-rc3-e8d018dd0257+ #1
> [   29.026826] XFS (loop5): Starting recovery (logdev: internal)
> [   29.027468] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [   29.029009] XFS (loop5): Metadata corruption detected at xfs_btree_lookup_get_block+0x27a/0x300, xfs_refcountbt block 0x28
> [   29.029843] RIP: 0010:xfs_btree_lookup_get_block+0xc4/0x300
> [   29.031365] XFS (loop5): Unmount and run xfs_repair
> [   29.032089] Code: ff ff 31 ff 41 89 c7 89 c6 e8 48 3d 8a ff 45 85 ff 0f 85 1d 01 00 00 e8 5a 3b 8a ff 4c 8b 75 c0 4d 85 f6 74 37 e8 4c 3b 8a ff <49> 8b 96 28 02 00 00 48 8b 4d c8 48 8b 12 48 89 cf 48 89 4d b0 48
> [   29.032779] XFS (loop5): Failed to recover leftover CoW staging extents, err -117.
> [   29.035256] RSP: 0018:ffffc9000108b910 EFLAGS: 00010246
> [   29.035267] RAX: 0000000000000000 RBX: ffff888013f10000 RCX: ffffffff81a32768
> [   29.036298] XFS (loop5): Filesystem has been shut down due to log error (0x2).
> [   29.037030] RDX: 0000000000000000 RSI: ffff888013eba340 RDI: 0000000000000002
> [   29.037994] XFS (loop5): Please unmount the filesystem and rectify the problem(s).
> [   29.038973] RBP: ffffc9000108b968 R08: ffffc9000108bb88 R09: ffff888013f10000
> [   29.038982] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000007
> [   29.038989] R13: ffffc9000108b9a8 R14: 0000000000000003 R15: 0000000000000000
> [   29.038997] FS:  00007f44ba73a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
> [   29.041082] XFS (loop7): Starting recovery (logdev: internal)
> [   29.041985] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   29.043907] XFS (loop7): Metadata corruption detected at xfs_btree_lookup_get_block+0x27a/0x300, xfs_refcountbt block 0x28
> [   29.043955] CR2: 000000000000022b CR3: 000000000e56e003 CR4: 0000000000770ef0
> [   29.045085] XFS (loop7): Unmount and run xfs_repair
> [   29.045870] PKRU: 55555554
> [   29.046689] XFS (loop7): Failed to recover leftover CoW staging extents, err -117.
> [   29.048110] Call Trace:
> [   29.049079] XFS (loop7): Filesystem has been shut down due to log error (0x2).
> [   29.049753]  <TASK>
> [   29.050160] XFS (loop7): Please unmount the filesystem and rectify the problem(s).
> [   29.051195]  xfs_btree_lookup+0xfe/0x800
> [   29.053556] XFS (loop1): Metadata corruption detected at xfs_btree_lookup_get_block+0x27a/0x300, xfs_refcountbt block 0x28
> [   29.053882]  ? __this_cpu_preempt_check+0x20/0x30
> [   29.054487] XFS (loop1): Unmount and run xfs_repair
> [   29.055921]  ? __pfx_xfs_refcount_recover_extent+0x10/0x10
> [   29.056575] XFS (loop1): Failed to recover leftover CoW staging extents, err -117.
> [   29.057249]  ? __pfx_xfs_refcount_recover_extent+0x10/0x10
> [   29.057993] XFS (loop1): Filesystem has been shut down due to log error (0x2).
> [   29.059020]  xfs_btree_simple_query_range+0x54/0x280
> [   29.059038]  ? write_comp_data+0x2f/0x90
> [   29.059784] XFS (loop1): Please unmount the filesystem and rectify the problem(s).
> [   29.060778]  ? __pfx_xfs_refcount_recover_extent+0x10/0x10
> [   29.062080] XFS (loop4): Metadata corruption detected at xfs_btree_lookup_get_block+0x27a/0x300, xfs_refcountbt block 0x28
> [   29.063021]  xfs_btree_query_range+0x18a/0x1a0
> [   29.063773] XFS (loop4): Unmount and run xfs_repair
> [   29.065266]  ? xfs_refcountbt_init_common+0x3b/0x90
> [   29.065891] XFS (loop4): Failed to recover leftover CoW staging extents, err -117.
> [   29.066560]  xfs_refcount_recover_cow_leftovers+0x18c/0x4a0
> [   29.066583]  ? xfs_perag_grab+0x143/0x340
> [   29.067266] XFS (loop4): Filesystem has been shut down due to log error (0x2).
> [   29.068299]  xfs_reflink_recover_cow+0x79/0xf0
> [   29.069063] XFS (loop4): Please unmount the filesystem and rectify the problem(s).
> [   29.069623]  xlog_recover_finish+0x136/0x420
> [   29.071179] XFS (loop7): Ending recovery (logdev: internal)
> [   29.071227]  ? queue_delayed_work_on+0x9f/0xf0
> [   29.072328] XFS (loop7): Error -5 reserving per-AG metadata reserve pool.
> [   29.072876]  xfs_log_mount_finish+0x187/0x1d0
> [   29.073692] XFS (loop1): Ending recovery (logdev: internal)
> [   29.074252]  xfs_mountfs+0x76e/0xce0
> [   29.074271]  xfs_fs_fill_super+0x7aa/0xdc0
> [   29.075574] XFS (loop1): Error -5 reserving per-AG metadata reserve pool.
> [   29.075809]  get_tree_bdev+0x24b/0x350
> [   29.076634] XFS (loop5): Ending recovery (logdev: internal)
> [   29.077084]  ? __pfx_xfs_fs_fill_super+0x10/0x10
> [   29.077676] XFS (loop5): Error -5 reserving per-AG metadata reserve pool.
> [   29.078560]  xfs_fs_get_tree+0x25/0x30
> [   29.078581]  vfs_get_tree+0x3b/0x140
> [   29.079464] XFS (loop4): Ending recovery (logdev: internal)
> [   29.079871]  path_mount+0x769/0x10f0
> [   29.080533] XFS (loop4): Error -5 reserving per-AG metadata reserve pool.
> [   29.081442]  ? write_comp_data+0x2f/0x90
> [   29.085330]  do_mount+0xaf/0xd0
> [   29.085811] XFS (loop2): Starting recovery (logdev: internal)
> [   29.085825]  __x64_sys_mount+0x14b/0x160
> [   29.087213]  do_syscall_64+0x3b/0x90
> [   29.087534] XFS (loop2): Metadata corruption detected at xfs_btree_lookup_get_block+0x27a/0x300, xfs_refcountbt block 0x28
> [   29.087758]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [   29.089249] XFS (loop2): Unmount and run xfs_repair
> [   29.089936] RIP: 0033:0x7f44ba8673ae
> [   29.090639] XFS (loop2): Failed to recover leftover CoW staging extents, err -117.
> [   29.091112] Code: 48 8b 0d f5 8a 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c2 8a 0c 00 f7 d8 64 89 01 48
> [   29.092123] XFS (loop2): Filesystem has been shut down due to log error (0x2).
> [   29.094617] RSP: 002b:00007fffeaff1b78 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
> [   29.094632] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f44ba8673ae
> [   29.094640] RDX: 0000000020009580 RSI: 00000000200095c0 RDI: 00007fffeaff1cb0
> [   29.095630] XFS (loop2): Please unmount the filesystem and rectify the problem(s).
> [   29.096664] RBP: 00007fffeaff1d40 R08: 00007fffeaff1bb0 R09: 0000000000000000
> [   29.099193] XFS (loop2): Ending recovery (logdev: internal)
> [   29.099660] R10: 0000000000000800 R11: 0000000000000206 R12: 0000000000401260
> [   29.100702] XFS (loop2): Error -5 reserving per-AG metadata reserve pool.
> [   29.101426] R13: 00007fffeaff1e80 R14: 0000000000000000 R15: 0000000000000000
> [   29.104290]  </TASK>
> [   29.104623] Modules linked in:
> [   29.105081] CR2: 000000000000022b
> [   29.105560] ---[ end trace 0000000000000000 ]---
> [   29.106207] RIP: 0010:xfs_btree_lookup_get_block+0xc4/0x300
> [   29.106985] Code: ff ff 31 ff 41 89 c7 89 c6 e8 48 3d 8a ff 45 85 ff 0f 85 1d 01 00 00 e8 5a 3b 8a ff 4c 8b 75 c0 4d 85 f6 74 37 e8 4c 3b 8a ff <49> 8b 96 28 02 00 00 48 8b 4d c8 48 8b 12 48 89 cf 48 89 4d b0 48
> [   29.109472] RSP: 0018:ffffc9000108b910 EFLAGS: 00010246
> [   29.110190] RAX: 0000000000000000 RBX: ffff888013f10000 RCX: ffffffff81a32768
> [   29.111153] RDX: 0000000000000000 RSI: ffff888013eba340 RDI: 0000000000000002
> [   29.112114] RBP: ffffc9000108b968 R08: ffffc9000108bb88 R09: ffff888013f10000
> [   29.113072] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000007
> [   29.114030] R13: ffffc9000108b9a8 R14: 0000000000000003 R15: 0000000000000000
> [   29.114989] FS:  00007f44ba73a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
> [   29.116069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   29.116860] CR2: 000000000000022b CR3: 000000000e56e003 CR4: 0000000000770ef0
> [   29.117831] PKRU: 55555554
> [   29.118220] note: repro[544] exited with irqs disabled
> [   29.452491] loop0: detected capacity change from 0 to 32768
> "
> 
> I see this similar issue in syzbot link:
> https://syzkaller.appspot.com/bug?id=e2907149c69cbccae0842eb502b8af4f6fac52a0
> But it didn't provide the bisect commit info due to bisect failure.
> 
> I hope above info is helpful.
> 
> Thanks!
> 
> ---
> 
> If you don't need the following environment to reproduce the problem or if you
> already have one, please ignore the following information.
> 
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
>    // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
>    // You could change the bzImage_xxx as you want
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@localhost
> 
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@localhost:/root/
> 
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage           //x should equal or less than cpu num your pc has
> 
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
> 
> 
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
> make
> make install
> 
> Thanks!
> BR.