"Ritesh Harjani (IBM)" <ritesh.list@xxxxxxxxx> writes: > An async dio write to a sparse file can generate a lot of extents > and when we unlink this file (using rm), the kernel can be busy in umapping > and freeing those extents as part of transaction processing. > Add cond_resched() in xfs_bunmapi_range() to avoid soft lockups > messages like these. Here is a call trace of such a soft lockup. > > watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435] > CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1 > Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker > NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290 > LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290 > Call Trace: > xfs_alloc_get_rec+0x54/0x1b0 (unreliable) > xfs_alloc_compute_aligned+0x5c/0x144 > xfs_alloc_ag_vextent_size+0x238/0x8d4 > xfs_alloc_fix_freelist+0x540/0x694 > xfs_free_extent_fix_freelist+0x84/0xe0 > __xfs_free_extent+0x74/0x1ec > xfs_extent_free_finish_item+0xcc/0x214 > xfs_defer_finish_one+0x194/0x388 > xfs_defer_finish_noroll+0x1b4/0x5c8 > xfs_defer_finish+0x2c/0xc4 > xfs_bunmapi_range+0xa4/0x100 > xfs_itruncate_extents_flags+0x1b8/0x2f4 > xfs_inactive_truncate+0xe0/0x124 > xfs_inactive+0x30c/0x3e0 > xfs_inodegc_worker+0x140/0x234 > process_scheduled_works+0x240/0x57c > worker_thread+0x198/0x468 > kthread+0x138/0x140 > start_kernel_thread+0x14/0x18 > My v1 patch had cond_resched() in xfs_defer_finish_noroll, since I was suspecting that it's a common point where we loop for many other operations. And initially Dave also suggested for the same [1]. But I was not totally convinced given the only problematic path I had till now was in unmapping extents. So this patch keeps the cond_resched() in xfs_bunmapi_range() loop. [1]: https://lore.kernel.org/all/ZZ8OaNnp6b%2FPJzsb@xxxxxxxxxxxxxxxxxxx/ However, I was able to reproduce a problem with reflink remapping path both on Power (with 64k bs) and on x86 (with preempt=none and with KASAN enabled). I actually noticed while I was doing regression testing of some of the iomap changes with KASAN enabled. The issue was seen with generic/175 for both on Power and x86. Do you think we should keep the cond_resched() inside xfs_defer_finish_noroll() loop like we had in v1 [2]. If yes, then I can rebase v1 on the latest upstream tree and also update the commit msg with both call stacks. [2]: https://lore.kernel.org/all/0bfaf740a2d10cc846616ae05963491316850c52.1713674899.git.ritesh.list@xxxxxxxxx/ <call stack on Power> ====================== run fstests generic/175 at 2024-02-02 04:40:21 <...> [ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0-150600.5-default #1 NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs] LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs] Call Trace: 0xc000000014107c00 (unreliable) __xfs_btree_updkeys+0x8c/0x2c0 [xfs] xfs_btree_update_keys+0x150/0x170 [xfs] xfs_btree_lshift+0x534/0x660 [xfs] xfs_btree_make_block_unfull+0x19c/0x240 [xfs] xfs_btree_insrec+0x4e4/0x630 [xfs] xfs_btree_insert+0x104/0x2d0 [xfs] xfs_rmap_insert+0xc4/0x260 [xfs] xfs_rmap_map_shared+0x228/0x630 [xfs] xfs_rmap_finish_one+0x2d4/0x350 [xfs] xfs_rmap_update_finish_item+0x44/0xc0 [xfs] xfs_defer_finish_noroll+0x2e4/0x740 [xfs] __xfs_trans_commit+0x1f4/0x400 [xfs] xfs_reflink_remap_extent+0x2d8/0x650 [xfs] xfs_reflink_remap_blocks+0x154/0x320 [xfs] xfs_file_remap_range+0x138/0x3a0 [xfs] do_clone_file_range+0x11c/0x2f0 vfs_clone_file_range+0x60/0x1c0 ioctl_file_clone+0x78/0x140 sys_ioctl+0x934/0x1270 system_call_exception+0x158/0x320 system_call_vectored_common+0x15c/0x2ec <call stack on x86 with KASAN> =============================== watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [xfs_io:3438095] CPU: 6 PID: 3438095 Comm: xfs_io Not tainted 6.9.0-rc5-xfstests-perf-00008-g4e2752e99f55 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.o4 RIP: 0010:_raw_spin_unlock_irqrestore+0x3c/0x60 Code: 10 48 89 fb 48 83 c7 18 e8 31 7c 10 fd 48 89 df e8 79 f5 10 fd f7 c5 00 02 00 00 74 06 e8 0c 6 Call Trace: <IRQ> ? watchdog_timer_fn+0x2dc/0x3a0 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x4a1/0x870 ? __pfx___hrtimer_run_queues+0x10/0x10 ? kvm_clock_get_cycles+0x18/0x30 ? ktime_get_update_offsets_now+0xc6/0x2f0 ? hrtimer_interrupt+0x2b8/0x7a0 ? __sysvec_apic_timer_interrupt+0xca/0x390 ? sysvec_apic_timer_interrupt+0x65/0x80 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 ? _raw_spin_unlock_irqrestore+0x34/0x60 ? _raw_spin_unlock_irqrestore+0x3c/0x60 ? _raw_spin_unlock_irqrestore+0x34/0x60 get_partial_node.part.0+0x1af/0x340 ___slab_alloc+0xc07/0x1250 ? do_vfs_ioctl+0xe5c/0x1660 ? __x64_sys_ioctl+0xd5/0x1b0 ? __alloc_object+0x39/0x660 ? __pfx___might_resched+0x10/0x10 ? __alloc_object+0x39/0x660 ? kmem_cache_alloc+0x3cd/0x410 ? should_failslab+0xe/0x20 kmem_cache_alloc+0x3cd/0x410 __alloc_object+0x39/0x660 __create_object+0x22/0x90 kmem_cache_alloc+0x324/0x410 xfs_bui_init+0x1b/0x150 xfs_bmap_update_create_intent+0x48/0x110 ? __pfx_xfs_bmap_update_create_intent+0x10/0x10 xfs_defer_create_intent+0xcc/0x1b0 xfs_defer_create_intents+0x8f/0x230 xfs_defer_finish_noroll+0x1c0/0x1160 ? xfs_inode_item_precommit+0x2c1/0x880 ? __create_object+0x5e/0x90 ? __pfx_xfs_defer_finish_noroll+0x10/0x10 ? xfs_trans_run_precommits+0x126/0x200 __xfs_trans_commit+0x767/0xbe0 ? inode_maybe_inc_iversion+0xe2/0x150 ? __pfx___xfs_trans_commit+0x10/0x10 xfs_reflink_remap_extent+0x654/0xd40 ? __pfx_xfs_reflink_remap_extent+0x10/0x10 ? __pfx_down_read_nested+0x10/0x10 xfs_reflink_remap_blocks+0x21a/0x850 ? __pfx_xfs_reflink_remap_blocks+0x10/0x10 ? _raw_spin_unlock+0x23/0x40 ? xfs_reflink_remap_prep+0x47d/0x900 xfs_file_remap_range+0x296/0xb40 ? __pfx_xfs_file_remap_range+0x10/0x10 ? __pfx_lock_acquire+0x10/0x10 ? __pfx___might_resched+0x10/0x10 vfs_clone_file_range+0x260/0xc20 ioctl_file_clone+0x49/0xb0 do_vfs_ioctl+0xe5c/0x1660 ? __pfx_do_vfs_ioctl+0x10/0x10 ? trace_irq_enable.constprop.0+0xd2/0x110 ? kasan_quarantine_put+0x7e/0x1d0 ? do_sys_openat2+0x120/0x170 ? lock_acquire+0x43b/0x4f0 ? __pfx_lock_release+0x10/0x10 ? __pfx_do_sys_openat2+0x10/0x10 ? __do_sys_newfstatat+0x94/0xe0 ? __fget_files+0x1ce/0x330 __x64_sys_ioctl+0xd5/0x1b0 do_syscall_64+0x6a/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7ffff7d1a94f </TASK> -ritesh > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > cc: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_bmap.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index 656c95a22f2e..44d5381bc66f 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -6354,6 +6354,7 @@ xfs_bunmapi_range( > error = xfs_defer_finish(tpp); > if (error) > goto out; > + cond_resched(); > } > out: > return error; > -- > 2.44.0