Hi everyone, Shortly after 6.5 was tagged, I started seeing the following stacktraces when running xfs through fstests on arm64. Curiously, x86_64 does not seem affected. At first I thought this might be caused by the bug fixes in my development tree, so I started bisecting them. Bisecting identified a particular patchset[1] that didn't seem like it was the culprit. A couple of days later, one of my arm64 vms with that patch reverted crashed in the same way. So, clearly not the culprit. [1] https://lore.kernel.org/linux-xfs/169335056933.3525521.6054773682023937525.stgit@frogsfrogsfrogs/ run fstests generic/162 at 2023-09-11 22:06:42 spectre-v4 mitigation disabled by command-line option XFS (sda2): EXPERIMENTAL metadata directory feature in use. Use at your own risk! XFS (sda2): EXPERIMENTAL realtime allocation group feature in use. Use at your own risk! XFS (sda2): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! XFS (sda2): Mounting V5 Filesystem ec3b0e05-f8c2-4fe5-a6cf-d8c5819c6ee7 XFS (sda2): Ending clean mount XFS (sda3): EXPERIMENTAL metadata directory feature in use. Use at your own risk! XFS (sda3): EXPERIMENTAL realtime allocation group feature in use. Use at your own risk! XFS (sda3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! XFS (sda3): Mounting V5 Filesystem 0c328923-fa81-458f-846c-45bb6828aa0f XFS (sda3): Ending clean mount XFS (sda3): Quotacheck needed: Please wait. XFS (sda3): Quotacheck: Done. XFS (sda3): Unmounting Filesystem 0c328923-fa81-458f-846c-45bb6828aa0f XFS (sda3): EXPERIMENTAL metadata directory feature in use. Use at your own risk! XFS (sda3): EXPERIMENTAL realtime allocation group feature in use. Use at your own risk! XFS (sda3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! XFS (sda3): Mounting V5 Filesystem 287e61f0-a509-4dd1-b788-70076dda0efd XFS (sda3): Ending clean mount XFS (sda3): Quotacheck needed: Please wait. XFS (sda3): Quotacheck: Done. XFS (sda3): Unmounting Filesystem 287e61f0-a509-4dd1-b788-70076dda0efd XFS (sda3): EXPERIMENTAL metadata directory feature in use. Use at your own risk! XFS (sda3): EXPERIMENTAL realtime allocation group feature in use. Use at your own risk! XFS (sda3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! XFS (sda3): Mounting V5 Filesystem 287e61f0-a509-4dd1-b788-70076dda0efd XFS (sda3): Ending clean mount ================================================================== BUG: KASAN: slab-use-after-free in d_alloc_parallel+0x850/0xa00 Read of size 4 at addr fffffc0176c88360 by task xfs_io/1341106 CPU: 0 PID: 1341106 Comm: xfs_io Not tainted 6.5.0-xfsa #12 7284e9be14b81c73627c489aa5df798cf1143960 Hardware name: QEMU KVM Virtual Machine, BIOS 1.5.1 06/16/2021 Call trace: dump_backtrace+0x9c/0x100 show_stack+0x20/0x38 dump_stack_lvl+0x48/0x60 print_report+0xf4/0x5b0 kasan_report+0xa4/0xf0 __asan_report_load4_noabort+0x20/0x30 d_alloc_parallel+0x850/0xa00 __lookup_slow+0x11c/0x2e8 walk_component+0x1f8/0x498 link_path_walk.part.0.constprop.0+0x41c/0x960 path_lookupat+0x70/0x590 filename_lookup+0x144/0x368 user_path_at_empty+0x54/0x88 do_faccessat+0x3f0/0x7c0 __arm64_sys_faccessat+0x78/0xb0 do_el0_svc+0x124/0x318 el0_svc+0x34/0xe8 el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x198 Allocated by task 1340914: kasan_save_stack+0x2c/0x58 kasan_set_track+0x2c/0x40 kasan_save_alloc_info+0x24/0x38 __kasan_slab_alloc+0x74/0x90 kmem_cache_alloc_lru+0x180/0x440 __d_alloc+0x40/0x830 d_alloc+0x3c/0x1c8 d_alloc_parallel+0xe4/0xa00 __lookup_slow+0x11c/0x2e8 walk_component+0x1f8/0x498 path_lookupat+0x10c/0x590 filename_lookup+0x144/0x368 user_path_at_empty+0x54/0x88 do_readlinkat+0xcc/0x250 __arm64_sys_readlinkat+0x90/0xe0 do_el0_svc+0x124/0x318 el0_svc+0x34/0xe8 el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x198 Last potentially related work creation: kasan_save_stack+0x2c/0x58 __kasan_record_aux_stack+0x9c/0xc8 kasan_record_aux_stack_noalloc+0x14/0x20 __call_rcu_common.constprop.0+0x74/0x5b0 call_rcu+0x18/0x30 dentry_free+0x9c/0x158 __dentry_kill+0x434/0x578 dput+0x30c/0x6b8 step_into+0xa50/0x1680 walk_component+0xb0/0x498 path_lookupat+0x10c/0x590 filename_lookup+0x144/0x368 user_path_at_empty+0x54/0x88 do_readlinkat+0xcc/0x250 __arm64_sys_readlinkat+0x90/0xe0 do_el0_svc+0x124/0x318 el0_svc+0x34/0xe8 el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x198 Second to last potentially related work creation: kasan_save_stack+0x2c/0x58 __kasan_record_aux_stack+0x9c/0xc8 kasan_record_aux_stack_noalloc+0x14/0x20 __call_rcu_common.constprop.0+0x74/0x5b0 call_rcu+0x18/0x30 dentry_free+0x9c/0x158 __dentry_kill+0x434/0x578 dput+0x30c/0x6b8 step_into+0xa50/0x1680 walk_component+0xb0/0x498 path_lookupat+0x10c/0x590 filename_lookup+0x144/0x368 user_path_at_empty+0x54/0x88 do_readlinkat+0xcc/0x250 __arm64_sys_readlinkat+0x90/0xe0 do_el0_svc+0x124/0x318 el0_svc+0x34/0xe8 el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x198 The buggy address belongs to the object at fffffc0176c88340 which belongs to the cache dentry of size 216 The buggy address is located 32 bytes inside of freed 216-byte region [fffffc0176c88340, fffffc0176c88418) The buggy address belongs to the physical page: page:ffffffff005db200 refcount:1 mapcount:0 mapping:0000000000000000 index:0xfffffc0176c86900 pfn:0x1b6c8 memcg:fffffc004195a001 flags: 0x1ffe000000000200(slab|node=0|zone=1|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 1ffe000000000200 fffffc00e0016b80 ffffffff0052bb90 ffffffff00405350 raw: fffffc0176c86900 0000000000ea0037 00000001ffffffff fffffc004195a001 page dumped because: kasan: bad access detected Memory state around the buggy address: fffffc0176c88200: fc fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb fffffc0176c88280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >fffffc0176c88300: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ fffffc0176c88380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fffffc0176c88400: fb fb fb fc fc fc fc fc fc fc fc fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint XFS (sda2): Unmounting Filesystem ec3b0e05-f8c2-4fe5-a6cf-d8c5819c6ee7 XFS (sda3): EXPERIMENTAL online scrub feature in use. Use at your own risk! XFS (sda3): Unmounting Filesystem 287e61f0-a509-4dd1-b788-70076dda0efd memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=1343704 'xfs_repair' XFS (sda3): EXPERIMENTAL metadata directory feature in use. Use at your own risk! XFS (sda3): EXPERIMENTAL realtime allocation group feature in use. Use at your own risk! XFS (sda3): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! XFS (sda3): Mounting V5 Filesystem 287e61f0-a509-4dd1-b788-70076dda0efd XFS (sda3): Ending clean mount XFS (sda3): Unmounting Filesystem 287e61f0-a509-4dd1-b788-70076dda0efd run fstests generic/137 at 2023-09-11 22:06:59 spectre-v4 mitigation disabled by command-line option XFS (sda2): EXPERIMENTAL metadata directory feature in use. Use at your own risk! XFS (sda2): EXPERIMENTAL realtime allocation group feature in use. Use at your own risk! XFS (sda2): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! XFS (sda2): Mounting V5 Filesystem ec3b0e05-f8c2-4fe5-a6cf-d8c5819c6ee7 XFS (sda2): Ending clean mount XFS (sda2): Unmounting Filesystem ec3b0e05-f8c2-4fe5-a6cf-d8c5819c6ee7 XFS (sda2): EXPERIMENTAL metadata directory feature in use. Use at your own risk! XFS (sda2): EXPERIMENTAL realtime allocation group feature in use. Use at your own risk! XFS (sda2): EXPERIMENTAL parent pointer feature enabled. Use at your own risk! XFS (sda2): Mounting V5 Filesystem ec3b0e05-f8c2-4fe5-a6cf-d8c5819c6ee7 XFS (sda2): Ending clean mount Unable to handle kernel paging request at virtual address e0c83a00c0029201 KASAN: maybe wild-memory-access in range [0x0641e00600149008-0x0641e0060014900f] Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [e0c83a00c0029201] address between user and kernel address ranges Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison xfs ext2 mbcache dm_flakey dm_snapshot dm_bufio dm_zero nft_chain_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 rpcsec_gss_krb5 auth_rpcgss xt_tcpudp crct10dif_ce bfq ip_set_hash_ip ip_set_hash_net xt_set nft_compat ip_set_hash_mac nf_tables libcrc32c sch_fq_codel fuse configfs efivarfs ip_tables x_tables overlay nfsv4 [last unloaded: scsi_debug] CPU: 1 PID: 1348080 Comm: md5sum Tainted: G B 6.5.0-xfsa #12 7284e9be14b81c73627c489aa5df798cf1143960 Hardware name: QEMU KVM Virtual Machine, BIOS 1.5.1 06/16/2021 pstate: 40401005 (nZcv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--) pc : d_alloc_parallel+0x250/0xa00 lr : d_alloc_parallel+0x85c/0xa00 sp : fffffe008b8cf720 x29: fffffe008b8cf720 x28: 00c83c00c0029201 x27: fffffc00fa39e340 x26: 000000003c7f8e47 x25: dffffe0000000000 x24: fffffe0082ab9d90 x23: 0000000000000003 x22: fffffc014dd9df20 x21: 0641e006001490b0 x20: 0641e00600149008 x19: fffffe00825a55c0 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000001 x13: 80808080807fffff x12: fffffdc0105573b3 x11: 1fffffc0105573b2 x10: fffffdc0105573b2 x9 : dffffe0000000000 x8 : 0000023fefaa8c4e x7 : fffffe0082ab9d97 x6 : 0000000000000001 x5 : fffffe0082ab9d90 x4 : 0000000000000000 x3 : fffffe008070fad4 x2 : 0000000000000000 x1 : fffffe0082a9e9d8 x0 : 0000000000000000 Call trace: d_alloc_parallel+0x250/0xa00 path_openat+0x1030/0x23d0 do_filp_open+0x15c/0x338 do_sys_openat2+0x12c/0x168 __arm64_sys_openat+0x138/0x1d0 do_el0_svc+0x124/0x318 el0_svc+0x34/0xe8 el0t_64_sync_handler+0x13c/0x158 el0t_64_sync+0x190/0x198 Code: f94002b5 b4001375 d102a2b4 d343fe9c (38f96b80) ---[ end trace 0000000000000000 ]--- note: md5sum[1348080] exited with preempt_count 1 I tried popping /all/ the bugfixes, and for 6.5 that seemed to make fstests settle down. But then 6.6-rc1 came out, and the crashes returned. In addition, there's a new crash: run fstests xfs/711 at 2023-09-11 20:25:35 spectre-v4 mitigation disabled by command-line option XFS (sda3): Mounting V5 Filesystem add44358-2799-4361-b909-073e50305a53 XFS (sda3): Ending clean mount XFS (sda3): Quotacheck needed: Please wait. XFS (sda3): Quotacheck: Done. XFS (sda3): EXPERIMENTAL online scrub feature in use. Use at your own risk! Unable to handle kernel execute from non-executable memory at virtual address fffffe0082e85070 KASAN: probably user-memory-access in range [0x0000000417428380-0x0000000417428387] Mem abort info: ESR = 0x000000008600000f EC = 0x21: IABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x0f: level 3 permission fault swapper pgtable: 64k pages, 42-bit VAs, pgdp=0000000041d40000 [fffffe0082e85070] pgd=100000023fff0003, p4d=100000023fff0003, pud=100000023fff0003, pmd=100000023fff0003, pte=0068000042e80703 Internal error: Oops: 000000008600000f [#1] PREEMPT SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: xfs nft_chain_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 rpcsec_gss_krb5 auth_rpcgss xt_tcpudp ip_set_hash_ip ip_set_hash_net xt_set nft_compat ip_set_hash_mac ip_set bfq nf_tables libcrc32c crct10dif_ce nfnetlink sch_fq_codel fuse configfs efivarfs ip_tables x_tables overlay nfsv4 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.6.0-rc1-djwa #rc1 31f6bcd5927f495a7ed31b02fb778f37562278b5 Hardware name: QEMU KVM Virtual Machine, BIOS 1.5.1 06/16/2021 pstate: 20401005 (nzCv daif +PAN -UAO -TCO -DIT +SSBS BTYPE=--) pc : in_lookup_hashtable+0x16b0/0x2020 lr : rcu_core+0xadc/0x13a8 sp : fffffe0082f2fdc0 x29: fffffe0082f2fdc0 x28: fffffd801c1228a1 x27: fffffc00e0914500 x26: 0000000000000003 x25: fffffc01bea6aa38 x24: fffffe0082f2fea0 x23: 1fffffc0105e5fd0 x22: fffffe00826459f8 x21: dffffe0000000000 x20: 0000000000000003 x19: fffffc01bea6a9c0 x18: 0000000000000000 x17: fffffe013cac0000 x16: fffffe0082f20000 x15: 0000000000000000 x14: 0000000000000000 x13: 1fffffc0102ff4f1 x12: fffffdc0104dafc9 x11: 1fffffc0104dafc8 x10: fffffdc0104dafc8 x9 : fffffe00802188dc x8 : fffffd802037eb51 x7 : fffffe0082f2fab8 x6 : 0000000000000002 x5 : 0000000000000050 x4 : fffffc01bd7f34e8 x3 : 0000000000000118 x2 : fffffe0082e85070 x1 : fffffc0101bfa078 x0 : fffffc0101bfa078 Call trace: in_lookup_hashtable+0x16b0/0x2020 rcu_core_si+0x18/0x30 __do_softirq+0x280/0xad4 ____do_softirq+0x18/0x30 call_on_irq_stack+0x24/0x58 do_softirq_own_stack+0x24/0x38 irq_exit_rcu+0x198/0x238 el1_interrupt+0x38/0x58 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x64/0x68 arch_local_irq_enable+0x4/0x8 do_idle+0x32c/0x450 cpu_startup_entry+0x2c/0x40 secondary_start_kernel+0x230/0x2e0 __secondary_switched+0xb8/0xc0 Code: 00000000 00000000 00000000 00000000 (01bfa078) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt SMP: stopping secondary CPUs Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled CPU features: 0x00000000,70020043,1001700b Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- IDGI. Has anyone else seen this sort of crash? They all seem to revolve around UAF bugs with dentries that look like they've been rcu-freed recently. Kent said he's affected by some crash on arm64 too, so I cc'd him. I also haven't any clue why this hasn't triggered at all on x86_64. --D