On Fri, Sep 01, 2023 at 10:40:46PM -0700, Darrick J. Wong wrote: > On Sat, Sep 02, 2023 at 01:33:58PM +0800, Zorro Lang wrote: > > On Tue, Aug 29, 2023 at 04:08:03PM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > Ritesh Harjani reported that mount and unmount can race with the xfs cpu > > > hotplug notifier hooks and crash the kernel. Extend this test to > > > include that. > > > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > --- > > > > Oh, it covers a new crash bug, right? I just hit it [1]. Is there a known fix > > which can be specified by _fixed_by....? > > https://lore.kernel.org/linux-xfs/ZO6J4W9msOixUk05@xxxxxxxxxxxxxxxxxxx/T/#t > > Not merged yet, will ask Chandan to pull all my pending fixes next > week. Thanks, I'll add this link into commit log. We can add the _fixed_by... later. Or if you like, I can let this patchset wait one week. Thanks, Zorro > > --D > > > Thanks, > > Zorro > > > > [1] > > [12328.869261] run fstests generic/650 at 2023-09-01 21:29:58 > > [12330.643585] smpboot: CPU 38 is now offline > > [-- MARK -- Sat Sep 2 01:30:00 2023] > > [12332.435309] smpboot: CPU 164 is now offline > > [12333.137984] smpboot: CPU 94 is now offline > > [12333.818337] smpboot: CPU 63 is now offline > > [12334.959559] smpboot: CPU 127 is now offline > > [12335.631255] smpboot: CPU 160 is now offline > > .... > > .... > > [12555.494184] smpboot: Booting Node 1 Processor 193 APIC 0xb3 > > [12556.213072] smpboot: CPU 170 is now offline > > [12557.409451] smpboot: CPU 109 is now offline > > [12558.013384] XFS (pmem1): Unmounting Filesystem 23992a48-9538-4c53-8312-becd4fcf4f0a > > [12558.029879] smpboot: CPU 191 is now offline > > [12558.074326] general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI > > [12558.085798] KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] > > [12558.093415] CPU: 180 PID: 3988051 Comm: 650 Kdump: loaded Not tainted 6.5.0+ #1 > > [12558.100768] Hardware name: HPE ProLiant DL380 Gen11/ProLiant DL380 Gen11, BIOS 1.32 03/23/2023 > > [12558.109430] RIP: 0010:xlog_cil_pcp_dead+0x2b/0x540 [xfs] > > [12558.115080] Code: 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 c7 10 48 89 fa 48 c1 ea 03 48 83 ec 10 <80> 3c 02 00 0f 85 1e 04 00 00 48 b8 00 00 > > 00 00 00 fc ff df 48 8b > > [12558.133964] RSP: 0018:ffa000003a0c7988 EFLAGS: 00010286 > > [12558.139224] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffffb3aa11c6 > > [12558.146402] RDX: 0000000000000002 RSI: 00000000000000bf RDI: 0000000000000010 > > [12558.153580] RBP: ff1100062ed14000 R08: 0000000000000000 R09: fffa3bfffef4319f > > [12558.160759] R10: ffd1fffff7a18cff R11: 0000000000000000 R12: ff1100068dc5a180 > > [12558.167937] R13: 00000000000000bf R14: dffffc0000000000 R15: ff11003ff6c2f7e0 > > [12558.175115] FS: 00007f2fd250b740(0000) GS:ff11003ff4000000(0000) knlGS:0000000000000000 > > [12558.183254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [12558.189035] CR2: 00000000023e1048 CR3: 00000006c7a58002 CR4: 0000000000771ee0 > > [12558.196211] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [12558.203388] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > > [12558.210565] PKRU: 55555554 > > [12558.213286] Call Trace: > > [12558.215749] <TASK> > > [12558.217860] ? die_addr+0x3d/0xa0 > > [12558.221202] ? exc_general_protection+0x150/0x230 > > [12558.225943] ? asm_exc_general_protection+0x22/0x30 > > [12558.230859] ? __cancel_work_timer+0x216/0x460 > > [12558.235332] ? xlog_cil_pcp_dead+0x2b/0x540 [xfs] > > [12558.240221] ? xfs_inodegc_cpu_dead+0x76/0x380 [xfs] > > [12558.245380] xfs_cpu_dead+0xab/0x120 [xfs] > > [12558.249661] ? __pfx_xfs_cpu_dead+0x10/0x10 [xfs] > > [12558.254548] cpuhp_invoke_callback+0x2f6/0x830 > > [12558.259022] ? __pfx_iova_cpuhp_dead+0x10/0x10 > > [12558.263499] ? __pfx___lock_release+0x10/0x10 > > [12558.267886] __cpuhp_invoke_callback_range+0xcc/0x1c0 > > [12558.272970] ? __pfx___cpuhp_invoke_callback_range+0x10/0x10 > > [12558.278661] ? trace_cpuhp_exit+0x15e/0x1a0 > > [12558.282868] ? cpuhp_kick_ap_work+0x1e6/0x370 > > [12558.287252] _cpu_down+0x352/0x890 > > [12558.290678] cpu_device_down+0x68/0xa0 > > [12558.294450] device_offline+0x243/0x310 > > [12558.298311] ? __pfx_device_offline+0x10/0x10 > > [12558.302694] ? __pfx___mutex_lock+0x10/0x10 > > [12558.306904] ? __pfx_lock_acquire+0x10/0x10 > > [12558.311114] ? __pfx_sysfs_kf_write+0x10/0x10 > > [12558.315500] online_store+0x87/0xf0 > > [12558.319009] ? __pfx_online_store+0x10/0x10 > > [12558.323217] ? __pfx_sysfs_kf_write+0x10/0x10 > > [12558.327600] ? sysfs_file_ops+0xe0/0x170 > > [12558.331545] ? sysfs_kf_write+0x3d/0x170 > > [12558.335493] kernfs_fop_write_iter+0x355/0x530 > > [12558.339966] vfs_write+0x7bd/0xc40 > > [12558.343390] ? __pfx_vfs_write+0x10/0x10 > > [12558.347339] ? local_clock_noinstr+0x9/0xc0 > > [12558.351550] ? __fget_light+0x51/0x220 > > [12558.355326] ksys_write+0xf1/0x1d0 > > [12558.358747] ? __pfx_ksys_write+0x10/0x10 > > [12558.362779] ? ktime_get_coarse_real_ts64+0x130/0x170 > > [12558.367866] do_syscall_64+0x59/0x90 > > [12558.371464] ? exc_page_fault+0xaa/0xe0 > > [12558.375323] ? asm_exc_page_fault+0x22/0x30 > > [12558.379533] ? lockdep_hardirqs_on+0x79/0x100 > > [12558.383917] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > > [12558.392600] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 > > 89 54 24 18 48 89 74 24 > > [12558.411482] RSP: 002b:00007fffc4c0d178 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > [12558.419098] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f2fd233eba7 > > [12558.426274] RDX: 0000000000000002 RSI: 0000560a773d8000 RDI: 0000000000000001 > > [12558.433451] RBP: 0000560a773d8000 R08: 0000000000000000 R09: 00007f2fd23b14e0 > > [12558.440630] R10: 00007f2fd23b13e0 R11: 0000000000000246 R12: 0000000000000002 > > [12558.447808] R13: 00007f2fd23fb780 R14: 0000000000000002 R15: 00007f2fd23f69e0 > > ... > > > > > > > tests/generic/650 | 13 ++++++++++--- > > > 1 file changed, 10 insertions(+), 3 deletions(-) > > > > > > > > > diff --git a/tests/generic/650 b/tests/generic/650 > > > index 05c939b84f..773f93c7cb 100755 > > > --- a/tests/generic/650 > > > +++ b/tests/generic/650 > > > @@ -67,11 +67,18 @@ fsstress_args=(-w -d $stress_dir) > > > nr_cpus=$((LOAD_FACTOR * nr_hotplug_cpus)) > > > test "$nr_cpus" -gt 1024 && nr_cpus="$nr_hotplug_cpus" > > > fsstress_args+=(-p $nr_cpus) > > > -test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION") > > > +if [ -n "$SOAK_DURATION" ]; then > > > + test "$SOAK_DURATION" -lt 10 && SOAK_DURATION=10 > > > + fsstress_args+=(--duration="$((SOAK_DURATION / 10))") > > > +fi > > > > > > -nr_ops=$((25000 * TIME_FACTOR)) > > > +nr_ops=$((2500 * TIME_FACTOR)) > > > fsstress_args+=(-n $nr_ops) > > > -$FSSTRESS_PROG $FSSTRESS_AVOID -w "${fsstress_args[@]}" >> $seqres.full > > > +for ((i = 0; i < 10; i++)); do > > > + $FSSTRESS_PROG $FSSTRESS_AVOID -w "${fsstress_args[@]}" >> $seqres.full > > > + _test_cycle_mount > > > +done > > > + > > > rm -f $sentinel_file > > > > > > # success, all done > > > > > >