Dave Chinner <david@xxxxxxxxxxxxx> writes: > On Tue, May 07, 2024 at 10:43:07AM +0530, Ritesh Harjani (IBM) wrote: >> An async dio write to a sparse file can generate a lot of extents >> and when we unlink this file (using rm), the kernel can be busy in umapping >> and freeing those extents as part of transaction processing. >> >> Similarly xfs reflink remapping path also goes through >> xfs_defer_finish_noroll() for transaction processing. Since we can busy loop >> in this function, so let's add cond_resched() to avoid softlockup messages >> like these. > > You proposed this change two weeks ago: I said: > > "The problem is the number of extents being processed without > yielding, not the time spent processing each individual deferred > work chain to free the extent. Hence the explicit rescheduling > should be at the top level loop where it can be easily explained and > understand, not hidden deep inside the defer chain mechanism...." > > https://lore.kernel.org/linux-xfs/ZiWjbWrD60W%2F0s%2FF@xxxxxxxxxxxxxxxxxxx/ > >> watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435] >> CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1 >> Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker >> NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290 >> LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290 >> Call Trace: >> xfs_alloc_get_rec+0x54/0x1b0 (unreliable) >> xfs_alloc_compute_aligned+0x5c/0x144 >> xfs_alloc_ag_vextent_size+0x238/0x8d4 >> xfs_alloc_fix_freelist+0x540/0x694 >> xfs_free_extent_fix_freelist+0x84/0xe0 >> __xfs_free_extent+0x74/0x1ec >> xfs_extent_free_finish_item+0xcc/0x214 >> xfs_defer_finish_one+0x194/0x388 >> xfs_defer_finish_noroll+0x1b4/0x5c8 >> xfs_defer_finish+0x2c/0xc4 >> xfs_bunmapi_range+0xa4/0x100 >> xfs_itruncate_extents_flags+0x1b8/0x2f4 >> xfs_inactive_truncate+0xe0/0x124 >> xfs_inactive+0x30c/0x3e0 > > This is the same issue as you originally reported two weeks ago. My > comments about it still stand. > >> xfs_inodegc_worker+0x140/0x234 >> process_scheduled_works+0x240/0x57c >> worker_thread+0x198/0x468 >> kthread+0x138/0x140 >> start_kernel_thread+0x14/0x18 >> >> run fstests generic/175 at 2024-02-02 04:40:21 >> [ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] >> watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679] >> CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0 >> NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs] >> LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs] >> Call Trace: >> 0xc000000014107c00 (unreliable) >> __xfs_btree_updkeys+0x8c/0x2c0 [xfs] >> xfs_btree_update_keys+0x150/0x170 [xfs] >> xfs_btree_lshift+0x534/0x660 [xfs] >> xfs_btree_make_block_unfull+0x19c/0x240 [xfs] >> xfs_btree_insrec+0x4e4/0x630 [xfs] >> xfs_btree_insert+0x104/0x2d0 [xfs] >> xfs_rmap_insert+0xc4/0x260 [xfs] >> xfs_rmap_map_shared+0x228/0x630 [xfs] >> xfs_rmap_finish_one+0x2d4/0x350 [xfs] >> xfs_rmap_update_finish_item+0x44/0xc0 [xfs] >> xfs_defer_finish_noroll+0x2e4/0x740 [xfs] >> __xfs_trans_commit+0x1f4/0x400 [xfs] >> xfs_reflink_remap_extent+0x2d8/0x650 [xfs] >> xfs_reflink_remap_blocks+0x154/0x320 [xfs] > > And this is the same case of iterating millions of extents without > ever blocking on a lock, a buffer cache miss, transaction > reservation, etc. IOWs, xfs_reflink_remap_blocks() is another high > level extent iterator that needs the cond_resched() calls. ok. I was thinking if we add this to xfs_defer_finish_noroll(), then we can just take care of all such cases. But I agree with your point here. I will test adding cond_resched() to xfs_relink_remap_blocks() loop and will re-submit this patch. > > I can't wait for the kernel to always be pre-emptible so this whole > class of nasty hacks can go away forever. But in the mean time, they > still need to be placed appropriately. Sure Dave. Thanks for the review! -ritesh