On Tue, Mar 14, 2023 at 05:06:49PM +0800, Ye Bin wrote: > From: Ye Bin <yebin10@xxxxxxxxxx> > > There's a issue when do cpu offline test: > CPU: 48 PID: 1168152 Comm: umount Kdump: loaded Tainted: G L > pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > pc : assfail+0x8c/0xb4 > lr : assfail+0x38/0xb4 > sp : ffffa00033ce7c40 > x29: ffffa00033ce7c40 x28: ffffa00014794f30 > x27: ffffa00014f6ca20 x26: 1fffe0120b2e2030 > x25: ffff009059710188 x24: ffff00886c0a4650 > x23: 1fffe0110d8148ca x22: ffff009059710180 > x21: ffffa00015155680 x20: ffff00886c0a4000 > x19: 0000000000000001 x18: 0000000000000000 > x17: 0000000000000000 x16: 0000000000000000 > x15: 0000000000000007 x14: 1fffe00304cef265 > x13: ffff00182642b200 x12: ffff8012d37757bf > x11: 1fffe012d37757be x10: ffff8012d37757be > x9 : ffffa00010603a0c x8 : 0000000041b58ab3 > x7 : ffff94000679cf44 x6 : 00000000ffffffc0 > x5 : 0000000000000021 x4 : 00000000ffffffca > x3 : 1ffff40002a27ee1 x2 : 0000000000000004 > x1 : 0000000000000000 x0 : ffffa0001513f000 > Call trace: > assfail+0x8c/0xb4 > xfs_destroy_percpu_counters+0x98/0xa4 > xfs_fs_put_super+0x1a0/0x2a4 > generic_shutdown_super+0x104/0x2c0 > kill_block_super+0x8c/0xf4 > deactivate_locked_super+0xa4/0x164 > deactivate_super+0xb0/0xdc > cleanup_mnt+0x29c/0x3ec > __cleanup_mnt+0x1c/0x30 > task_work_run+0xe0/0x200 > do_notify_resume+0x244/0x320 > work_pending+0xc/0xa0 > > We analyzed the data in vmcore is correct. But triggered above issue. > As f689054aace2 ("percpu_counter: add percpu_counter_sum_all interface") > commit describes there is a small race window between the online CPUs traversal > of percpu_counter_sum and the CPU offline callback. This means percpu_counter_sum() > may return incorrect result during cpu offline. > To solve above issue use percpu_counter_sum_all() interface to make sure > result is correct to prevent false triggering of assertions. How about the other percpu_counter_sum callsites inside XFS? Some of them are involved in writing ondisk metadata (xfs_log_sb) or doing correctness checks (fs/xfs/scrub/*); shouldn't those also be using the _all variant? --D > Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx> > --- > fs/xfs/xfs_super.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 2479b5cbd75e..c0ce66f966ee 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1076,7 +1076,7 @@ xfs_destroy_percpu_counters( > percpu_counter_destroy(&mp->m_ifree); > percpu_counter_destroy(&mp->m_fdblocks); > ASSERT(xfs_is_shutdown(mp) || > - percpu_counter_sum(&mp->m_delalloc_blks) == 0); > + percpu_counter_sum_all(&mp->m_delalloc_blks) == 0); > percpu_counter_destroy(&mp->m_delalloc_blks); > percpu_counter_destroy(&mp->m_frextents); > } > -- > 2.31.1 >