On Mon, Apr 29, 2024 at 02:14:46PM +0530, Ritesh Harjani wrote: > "Ritesh Harjani (IBM)" <ritesh.list@xxxxxxxxx> writes: > > > An async dio write to a sparse file can generate a lot of extents > > and when we unlink this file (using rm), the kernel can be busy in umapping > > and freeing those extents as part of transaction processing. > > Add cond_resched() in xfs_bunmapi_range() to avoid soft lockups > > messages like these. Here is a call trace of such a soft lockup. > > > > watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435] > > CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1 > > Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker > > NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290 > > LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290 > > Call Trace: > > xfs_alloc_get_rec+0x54/0x1b0 (unreliable) > > xfs_alloc_compute_aligned+0x5c/0x144 > > xfs_alloc_ag_vextent_size+0x238/0x8d4 > > xfs_alloc_fix_freelist+0x540/0x694 > > xfs_free_extent_fix_freelist+0x84/0xe0 > > __xfs_free_extent+0x74/0x1ec > > xfs_extent_free_finish_item+0xcc/0x214 > > xfs_defer_finish_one+0x194/0x388 > > xfs_defer_finish_noroll+0x1b4/0x5c8 > > xfs_defer_finish+0x2c/0xc4 > > xfs_bunmapi_range+0xa4/0x100 > > xfs_itruncate_extents_flags+0x1b8/0x2f4 > > xfs_inactive_truncate+0xe0/0x124 > > xfs_inactive+0x30c/0x3e0 > > xfs_inodegc_worker+0x140/0x234 > > process_scheduled_works+0x240/0x57c > > worker_thread+0x198/0x468 > > kthread+0x138/0x140 > > start_kernel_thread+0x14/0x18 > > > > My v1 patch had cond_resched() in xfs_defer_finish_noroll, since I was > suspecting that it's a common point where we loop for many other > operations. And initially Dave also suggested for the same [1]. > But I was not totally convinced given the only problematic path I > had till now was in unmapping extents. So this patch keeps the > cond_resched() in xfs_bunmapi_range() loop. > > [1]: https://lore.kernel.org/all/ZZ8OaNnp6b%2FPJzsb@xxxxxxxxxxxxxxxxxxx/ > > However, I was able to reproduce a problem with reflink remapping path > both on Power (with 64k bs) and on x86 (with preempt=none and with KASAN > enabled). I actually noticed while I was doing regression testing of > some of the iomap changes with KASAN enabled. The issue was seen with > generic/175 for both on Power and x86. > > Do you think we should keep the cond_resched() inside > xfs_defer_finish_noroll() loop like we had in v1 [2]. If yes, then I can rebase > v1 on the latest upstream tree and also update the commit msg with both > call stacks. I think there ought to be one in xfs_defer_finish_noroll (or even xfs_trans_roll) when we're between dirty transactions, since long running transaction chains can indeed stall. That'll hopefully solve the problem for bunmapi and long running online repair operations. But that said, the main FICLONE loop continually creates new transaction chains, so you probably need one there too. Hence the grumblings about cond_resched whackamole. Thoughts? --D > [2]: https://lore.kernel.org/all/0bfaf740a2d10cc846616ae05963491316850c52.1713674899.git.ritesh.list@xxxxxxxxx/ > > > <call stack on Power> > ====================== > run fstests generic/175 at 2024-02-02 04:40:21 > <...> > [ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] > watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] > CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0-150600.5-default #1 > NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs] > LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs] > Call Trace: > 0xc000000014107c00 (unreliable) > __xfs_btree_updkeys+0x8c/0x2c0 [xfs] > xfs_btree_update_keys+0x150/0x170 [xfs] > xfs_btree_lshift+0x534/0x660 [xfs] > xfs_btree_make_block_unfull+0x19c/0x240 [xfs] > xfs_btree_insrec+0x4e4/0x630 [xfs] > xfs_btree_insert+0x104/0x2d0 [xfs] > xfs_rmap_insert+0xc4/0x260 [xfs] > xfs_rmap_map_shared+0x228/0x630 [xfs] > xfs_rmap_finish_one+0x2d4/0x350 [xfs] > xfs_rmap_update_finish_item+0x44/0xc0 [xfs] > xfs_defer_finish_noroll+0x2e4/0x740 [xfs] > __xfs_trans_commit+0x1f4/0x400 [xfs] > xfs_reflink_remap_extent+0x2d8/0x650 [xfs] > xfs_reflink_remap_blocks+0x154/0x320 [xfs] > xfs_file_remap_range+0x138/0x3a0 [xfs] > do_clone_file_range+0x11c/0x2f0 > vfs_clone_file_range+0x60/0x1c0 > ioctl_file_clone+0x78/0x140 > sys_ioctl+0x934/0x1270 > system_call_exception+0x158/0x320 > system_call_vectored_common+0x15c/0x2ec > > > <call stack on x86 with KASAN> > =============================== > watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [xfs_io:3438095] > CPU: 6 PID: 3438095 Comm: xfs_io Not tainted 6.9.0-rc5-xfstests-perf-00008-g4e2752e99f55 #1 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.o4 > RIP: 0010:_raw_spin_unlock_irqrestore+0x3c/0x60 > Code: 10 48 89 fb 48 83 c7 18 e8 31 7c 10 fd 48 89 df e8 79 f5 10 fd f7 c5 00 02 00 00 74 06 e8 0c 6 > Call Trace: > <IRQ> > ? watchdog_timer_fn+0x2dc/0x3a0 > ? __pfx_watchdog_timer_fn+0x10/0x10 > ? __hrtimer_run_queues+0x4a1/0x870 > ? __pfx___hrtimer_run_queues+0x10/0x10 > ? kvm_clock_get_cycles+0x18/0x30 > ? ktime_get_update_offsets_now+0xc6/0x2f0 > ? hrtimer_interrupt+0x2b8/0x7a0 > ? __sysvec_apic_timer_interrupt+0xca/0x390 > ? sysvec_apic_timer_interrupt+0x65/0x80 > </IRQ> > <TASK> > ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 > ? _raw_spin_unlock_irqrestore+0x34/0x60 > ? _raw_spin_unlock_irqrestore+0x3c/0x60 > ? _raw_spin_unlock_irqrestore+0x34/0x60 > get_partial_node.part.0+0x1af/0x340 > ___slab_alloc+0xc07/0x1250 > ? do_vfs_ioctl+0xe5c/0x1660 > ? __x64_sys_ioctl+0xd5/0x1b0 > ? __alloc_object+0x39/0x660 > ? __pfx___might_resched+0x10/0x10 > ? __alloc_object+0x39/0x660 > ? kmem_cache_alloc+0x3cd/0x410 > ? should_failslab+0xe/0x20 > kmem_cache_alloc+0x3cd/0x410 > __alloc_object+0x39/0x660 > __create_object+0x22/0x90 > kmem_cache_alloc+0x324/0x410 > xfs_bui_init+0x1b/0x150 > xfs_bmap_update_create_intent+0x48/0x110 > ? __pfx_xfs_bmap_update_create_intent+0x10/0x10 > xfs_defer_create_intent+0xcc/0x1b0 > xfs_defer_create_intents+0x8f/0x230 > xfs_defer_finish_noroll+0x1c0/0x1160 > ? xfs_inode_item_precommit+0x2c1/0x880 > ? __create_object+0x5e/0x90 > ? __pfx_xfs_defer_finish_noroll+0x10/0x10 > ? xfs_trans_run_precommits+0x126/0x200 > __xfs_trans_commit+0x767/0xbe0 > ? inode_maybe_inc_iversion+0xe2/0x150 > ? __pfx___xfs_trans_commit+0x10/0x10 > xfs_reflink_remap_extent+0x654/0xd40 > ? __pfx_xfs_reflink_remap_extent+0x10/0x10 > ? __pfx_down_read_nested+0x10/0x10 > xfs_reflink_remap_blocks+0x21a/0x850 > ? __pfx_xfs_reflink_remap_blocks+0x10/0x10 > ? _raw_spin_unlock+0x23/0x40 > ? xfs_reflink_remap_prep+0x47d/0x900 > xfs_file_remap_range+0x296/0xb40 > ? __pfx_xfs_file_remap_range+0x10/0x10 > ? __pfx_lock_acquire+0x10/0x10 > ? __pfx___might_resched+0x10/0x10 > vfs_clone_file_range+0x260/0xc20 > ioctl_file_clone+0x49/0xb0 > do_vfs_ioctl+0xe5c/0x1660 > ? __pfx_do_vfs_ioctl+0x10/0x10 > ? trace_irq_enable.constprop.0+0xd2/0x110 > ? kasan_quarantine_put+0x7e/0x1d0 > ? do_sys_openat2+0x120/0x170 > ? lock_acquire+0x43b/0x4f0 > ? __pfx_lock_release+0x10/0x10 > ? __pfx_do_sys_openat2+0x10/0x10 > ? __do_sys_newfstatat+0x94/0xe0 > ? __fget_files+0x1ce/0x330 > __x64_sys_ioctl+0xd5/0x1b0 > do_syscall_64+0x6a/0x140 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > RIP: 0033:0x7ffff7d1a94f > </TASK> > > > -ritesh > > > > > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> > > cc: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> > > --- > > fs/xfs/libxfs/xfs_bmap.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > > index 656c95a22f2e..44d5381bc66f 100644 > > --- a/fs/xfs/libxfs/xfs_bmap.c > > +++ b/fs/xfs/libxfs/xfs_bmap.c > > @@ -6354,6 +6354,7 @@ xfs_bunmapi_range( > > error = xfs_defer_finish(tpp); > > if (error) > > goto out; > > + cond_resched(); > > } > > out: > > return error; > > -- > > 2.44.0 >