Re: [PATCH 2/2] generic/650: race mount and unmount with cpu hotplug too

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 01, 2023 at 10:40:46PM -0700, Darrick J. Wong wrote:
> On Sat, Sep 02, 2023 at 01:33:58PM +0800, Zorro Lang wrote:
> > On Tue, Aug 29, 2023 at 04:08:03PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > 
> > > Ritesh Harjani reported that mount and unmount can race with the xfs cpu
> > > hotplug notifier hooks and crash the kernel.  Extend this test to
> > > include that.
> > > 
> > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > ---
> > 
> > Oh, it covers a new crash bug, right? I just hit it [1]. Is there a known fix
> > which can be specified by _fixed_by....?
> 
> https://lore.kernel.org/linux-xfs/ZO6J4W9msOixUk05@xxxxxxxxxxxxxxxxxxx/T/#t
> 
> Not merged yet, will ask Chandan to pull all my pending fixes next
> week.

Thanks, I'll add this link into commit log. We can add the _fixed_by... later.
Or if you like, I can let this patchset wait one week.

Thanks,
Zorro

> 
> --D
> 
> > Thanks,
> > Zorro
> > 
> > [1]
> > [12328.869261] run fstests generic/650 at 2023-09-01 21:29:58
> > [12330.643585] smpboot: CPU 38 is now offline
> > [-- MARK -- Sat Sep  2 01:30:00 2023]
> > [12332.435309] smpboot: CPU 164 is now offline
> > [12333.137984] smpboot: CPU 94 is now offline
> > [12333.818337] smpboot: CPU 63 is now offline
> > [12334.959559] smpboot: CPU 127 is now offline
> > [12335.631255] smpboot: CPU 160 is now offline
> > ....
> > ....
> > [12555.494184] smpboot: Booting Node 1 Processor 193 APIC 0xb3
> > [12556.213072] smpboot: CPU 170 is now offline
> > [12557.409451] smpboot: CPU 109 is now offline
> > [12558.013384] XFS (pmem1): Unmounting Filesystem 23992a48-9538-4c53-8312-becd4fcf4f0a
> > [12558.029879] smpboot: CPU 191 is now offline
> > [12558.074326] general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN NOPTI
> > [12558.085798] KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
> > [12558.093415] CPU: 180 PID: 3988051 Comm: 650 Kdump: loaded Not tainted 6.5.0+ #1
> > [12558.100768] Hardware name: HPE ProLiant DL380 Gen11/ProLiant DL380 Gen11, BIOS 1.32 03/23/2023
> > [12558.109430] RIP: 0010:xlog_cil_pcp_dead+0x2b/0x540 [xfs]
> > [12558.115080] Code: 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 c7 10 48 89 fa 48 c1 ea 03 48 83 ec 10 <80> 3c 02 00 0f 85 1e 04 00 00 48 b8 00 00
> >  00 00 00 fc ff df 48 8b
> > [12558.133964] RSP: 0018:ffa000003a0c7988 EFLAGS: 00010286
> > [12558.139224] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffffb3aa11c6
> > [12558.146402] RDX: 0000000000000002 RSI: 00000000000000bf RDI: 0000000000000010
> > [12558.153580] RBP: ff1100062ed14000 R08: 0000000000000000 R09: fffa3bfffef4319f
> > [12558.160759] R10: ffd1fffff7a18cff R11: 0000000000000000 R12: ff1100068dc5a180
> > [12558.167937] R13: 00000000000000bf R14: dffffc0000000000 R15: ff11003ff6c2f7e0
> > [12558.175115] FS:  00007f2fd250b740(0000) GS:ff11003ff4000000(0000) knlGS:0000000000000000
> > [12558.183254] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [12558.189035] CR2: 00000000023e1048 CR3: 00000006c7a58002 CR4: 0000000000771ee0
> > [12558.196211] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [12558.203388] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> > [12558.210565] PKRU: 55555554
> > [12558.213286] Call Trace:
> > [12558.215749]  <TASK>
> > [12558.217860]  ? die_addr+0x3d/0xa0
> > [12558.221202]  ? exc_general_protection+0x150/0x230
> > [12558.225943]  ? asm_exc_general_protection+0x22/0x30
> > [12558.230859]  ? __cancel_work_timer+0x216/0x460
> > [12558.235332]  ? xlog_cil_pcp_dead+0x2b/0x540 [xfs]
> > [12558.240221]  ? xfs_inodegc_cpu_dead+0x76/0x380 [xfs]
> > [12558.245380]  xfs_cpu_dead+0xab/0x120 [xfs]
> > [12558.249661]  ? __pfx_xfs_cpu_dead+0x10/0x10 [xfs]
> > [12558.254548]  cpuhp_invoke_callback+0x2f6/0x830
> > [12558.259022]  ? __pfx_iova_cpuhp_dead+0x10/0x10
> > [12558.263499]  ? __pfx___lock_release+0x10/0x10
> > [12558.267886]  __cpuhp_invoke_callback_range+0xcc/0x1c0
> > [12558.272970]  ? __pfx___cpuhp_invoke_callback_range+0x10/0x10
> > [12558.278661]  ? trace_cpuhp_exit+0x15e/0x1a0
> > [12558.282868]  ? cpuhp_kick_ap_work+0x1e6/0x370
> > [12558.287252]  _cpu_down+0x352/0x890
> > [12558.290678]  cpu_device_down+0x68/0xa0
> > [12558.294450]  device_offline+0x243/0x310
> > [12558.298311]  ? __pfx_device_offline+0x10/0x10
> > [12558.302694]  ? __pfx___mutex_lock+0x10/0x10
> > [12558.306904]  ? __pfx_lock_acquire+0x10/0x10
> > [12558.311114]  ? __pfx_sysfs_kf_write+0x10/0x10
> > [12558.315500]  online_store+0x87/0xf0
> > [12558.319009]  ? __pfx_online_store+0x10/0x10
> > [12558.323217]  ? __pfx_sysfs_kf_write+0x10/0x10
> > [12558.327600]  ? sysfs_file_ops+0xe0/0x170
> > [12558.331545]  ? sysfs_kf_write+0x3d/0x170
> > [12558.335493]  kernfs_fop_write_iter+0x355/0x530
> > [12558.339966]  vfs_write+0x7bd/0xc40
> > [12558.343390]  ? __pfx_vfs_write+0x10/0x10
> > [12558.347339]  ? local_clock_noinstr+0x9/0xc0
> > [12558.351550]  ? __fget_light+0x51/0x220
> > [12558.355326]  ksys_write+0xf1/0x1d0
> > [12558.358747]  ? __pfx_ksys_write+0x10/0x10
> > [12558.362779]  ? ktime_get_coarse_real_ts64+0x130/0x170
> > [12558.367866]  do_syscall_64+0x59/0x90
> > [12558.371464]  ? exc_page_fault+0xaa/0xe0
> > [12558.375323]  ? asm_exc_page_fault+0x22/0x30
> > [12558.379533]  ? lockdep_hardirqs_on+0x79/0x100
> > [12558.383917]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > [12558.392600] Code: 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48
> >  89 54 24 18 48 89 74 24
> > [12558.411482] RSP: 002b:00007fffc4c0d178 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > [12558.419098] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f2fd233eba7
> > [12558.426274] RDX: 0000000000000002 RSI: 0000560a773d8000 RDI: 0000000000000001
> > [12558.433451] RBP: 0000560a773d8000 R08: 0000000000000000 R09: 00007f2fd23b14e0
> > [12558.440630] R10: 00007f2fd23b13e0 R11: 0000000000000246 R12: 0000000000000002
> > [12558.447808] R13: 00007f2fd23fb780 R14: 0000000000000002 R15: 00007f2fd23f69e0
> > ...
> > 
> > 
> > >  tests/generic/650 |   13 ++++++++++---
> > >  1 file changed, 10 insertions(+), 3 deletions(-)
> > > 
> > > 
> > > diff --git a/tests/generic/650 b/tests/generic/650
> > > index 05c939b84f..773f93c7cb 100755
> > > --- a/tests/generic/650
> > > +++ b/tests/generic/650
> > > @@ -67,11 +67,18 @@ fsstress_args=(-w -d $stress_dir)
> > >  nr_cpus=$((LOAD_FACTOR * nr_hotplug_cpus))
> > >  test "$nr_cpus" -gt 1024 && nr_cpus="$nr_hotplug_cpus"
> > >  fsstress_args+=(-p $nr_cpus)
> > > -test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
> > > +if [ -n "$SOAK_DURATION" ]; then
> > > +	test "$SOAK_DURATION" -lt 10 && SOAK_DURATION=10
> > > +	fsstress_args+=(--duration="$((SOAK_DURATION / 10))")
> > > +fi
> > >  
> > > -nr_ops=$((25000 * TIME_FACTOR))
> > > +nr_ops=$((2500 * TIME_FACTOR))
> > >  fsstress_args+=(-n $nr_ops)
> > > -$FSSTRESS_PROG $FSSTRESS_AVOID -w "${fsstress_args[@]}" >> $seqres.full
> > > +for ((i = 0; i < 10; i++)); do
> > > +	$FSSTRESS_PROG $FSSTRESS_AVOID -w "${fsstress_args[@]}" >> $seqres.full
> > > +	_test_cycle_mount
> > > +done
> > > +
> > >  rm -f $sentinel_file
> > >  
> > >  # success, all done
> > > 
> > 
> 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux