On Tue, Mar 14, 2023 at 09:31:00AM -0700, Darrick J. Wong wrote: > On Tue, Mar 14, 2023 at 05:06:49PM +0800, Ye Bin wrote: > > From: Ye Bin <yebin10@xxxxxxxxxx> > > > > There's a issue when do cpu offline test: > > CPU: 48 PID: 1168152 Comm: umount Kdump: loaded Tainted: G L > > pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) > > pc : assfail+0x8c/0xb4 > > lr : assfail+0x38/0xb4 > > sp : ffffa00033ce7c40 > > x29: ffffa00033ce7c40 x28: ffffa00014794f30 > > x27: ffffa00014f6ca20 x26: 1fffe0120b2e2030 > > x25: ffff009059710188 x24: ffff00886c0a4650 > > x23: 1fffe0110d8148ca x22: ffff009059710180 > > x21: ffffa00015155680 x20: ffff00886c0a4000 > > x19: 0000000000000001 x18: 0000000000000000 > > x17: 0000000000000000 x16: 0000000000000000 > > x15: 0000000000000007 x14: 1fffe00304cef265 > > x13: ffff00182642b200 x12: ffff8012d37757bf > > x11: 1fffe012d37757be x10: ffff8012d37757be > > x9 : ffffa00010603a0c x8 : 0000000041b58ab3 > > x7 : ffff94000679cf44 x6 : 00000000ffffffc0 > > x5 : 0000000000000021 x4 : 00000000ffffffca > > x3 : 1ffff40002a27ee1 x2 : 0000000000000004 > > x1 : 0000000000000000 x0 : ffffa0001513f000 > > Call trace: > > assfail+0x8c/0xb4 > > xfs_destroy_percpu_counters+0x98/0xa4 > > xfs_fs_put_super+0x1a0/0x2a4 > > generic_shutdown_super+0x104/0x2c0 > > kill_block_super+0x8c/0xf4 > > deactivate_locked_super+0xa4/0x164 > > deactivate_super+0xb0/0xdc > > cleanup_mnt+0x29c/0x3ec > > __cleanup_mnt+0x1c/0x30 > > task_work_run+0xe0/0x200 > > do_notify_resume+0x244/0x320 > > work_pending+0xc/0xa0 > > > > We analyzed the data in vmcore is correct. But triggered above issue. > > As f689054aace2 ("percpu_counter: add percpu_counter_sum_all interface") > > commit describes there is a small race window between the online CPUs traversal > > of percpu_counter_sum and the CPU offline callback. This means percpu_counter_sum() > > may return incorrect result during cpu offline. > > To solve above issue use percpu_counter_sum_all() interface to make sure > > result is correct to prevent false triggering of assertions. > > How about the other percpu_counter_sum callsites inside XFS? Some of > them are involved in writing ondisk metadata (xfs_log_sb) or doing > correctness checks (fs/xfs/scrub/*); shouldn't those also be using the > _all variant? Ugh. I kinda wish that the percpu_counter_sum_all() patch had been cc'd to lists for subsystems that use percpu_counter_sum() extensively, or just to people who have modified that code in the past. The problem is that it uses cpu_possible_mask, which means it walks all possible CPUs that can be added to the system even if the CPUs aren't physically present. That can be a lot in the case of systems that can have cpu-capable nodes hotplugged into them, and that makes percpu_counter_sum_all() excitingly expensive for no good reason. AFAICT, if we are trying to close a race condition between iterating online CPUs not summing dying CPUs and the cpu dead notification updating the sum, then shouldn't we be using cpu_mask_or(cpu_online_mask, cpu_dying_mask) for summing iteration rather than just cpu_online_mask? i.e. when a CPU is being taken down, it gets added to the cpu_dying_mask, then removed from the cpu_online_mask, then the offline notifications are run (i.e. the percpu counter dead callback), and when the CPU reaches the CPUHP_TEARDOWN_CPU state, it is removed from the cpu_dying_mask because it is now dead and all the "cpu dying" callbacks have been run. Except when a hotplug event is being processed, cpu_dying_mask will be empty, hence there is little change in summing overhead. But it will close the summing race condition when a CPU is being offlined... That, I think, is the solution we want for XFS. Having the percpu counters just do the right thing is far better than always having to wonder if summation interface we are using is correct in the face of CPU hotplug. I'll put a patchset together to do: 1. fix percpu_counter_sum() to include the dying mask in it's iteration. This should fix the XFS issue. 2. change the only user of percpu_counter_sum_all() to only use percpu_counter_sum() because percpu_counter_sum_all() is now redundant. 3. remove percpu_counter_sum_all() because it is unused. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx