Re: [PATCHv3 1/1] xfs: Add cond_resched to xfs_defer_finish_noroll

Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> · Tue, 07 May 2024 12:52:49 +0530

Dave Chinner <david@xxxxxxxxxxxxx> writes:

> On Tue, May 07, 2024 at 10:43:07AM +0530, Ritesh Harjani (IBM) wrote:
>> An async dio write to a sparse file can generate a lot of extents
>> and when we unlink this file (using rm), the kernel can be busy in umapping
>> and freeing those extents as part of transaction processing.
>> 
>> Similarly xfs reflink remapping path also goes through
>> xfs_defer_finish_noroll() for transaction processing. Since we can busy loop
>> in this function, so let's add cond_resched() to avoid softlockup messages
>> like these.
>
> You proposed this change two weeks ago: I said:
>
> "The problem is the number of extents being processed without
> yielding, not the time spent processing each individual deferred
> work chain to free the extent. Hence the explicit rescheduling
> should be at the top level loop where it can be easily explained and
> understand, not hidden deep inside the defer chain mechanism...."
>
> https://lore.kernel.org/linux-xfs/ZiWjbWrD60W%2F0s%2FF@xxxxxxxxxxxxxxxxxxx/
>
>> watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
>> CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S  L   6.9.0-rc5-0-default #1
>> Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
>> NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
>> LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
>> Call Trace:
>>   xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
>>   xfs_alloc_compute_aligned+0x5c/0x144
>>   xfs_alloc_ag_vextent_size+0x238/0x8d4
>>   xfs_alloc_fix_freelist+0x540/0x694
>>   xfs_free_extent_fix_freelist+0x84/0xe0
>>   __xfs_free_extent+0x74/0x1ec
>>   xfs_extent_free_finish_item+0xcc/0x214
>>   xfs_defer_finish_one+0x194/0x388
>>   xfs_defer_finish_noroll+0x1b4/0x5c8
>>   xfs_defer_finish+0x2c/0xc4
>>   xfs_bunmapi_range+0xa4/0x100
>>   xfs_itruncate_extents_flags+0x1b8/0x2f4
>>   xfs_inactive_truncate+0xe0/0x124
>>   xfs_inactive+0x30c/0x3e0
>
> This is the same issue as you originally reported two weeks ago. My
> comments about it still stand.
>
>>   xfs_inodegc_worker+0x140/0x234
>>   process_scheduled_works+0x240/0x57c
>>   worker_thread+0x198/0x468
>>   kthread+0x138/0x140
>>   start_kernel_thread+0x14/0x18
>> 
>> run fstests generic/175 at 2024-02-02 04:40:21
>> [   C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
>>  watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
>>  CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0
>>  NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs]
>>  LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs]
>>  Call Trace:
>>   0xc000000014107c00 (unreliable)
>>   __xfs_btree_updkeys+0x8c/0x2c0 [xfs]
>>   xfs_btree_update_keys+0x150/0x170 [xfs]
>>   xfs_btree_lshift+0x534/0x660 [xfs]
>>   xfs_btree_make_block_unfull+0x19c/0x240 [xfs]
>>   xfs_btree_insrec+0x4e4/0x630 [xfs]
>>   xfs_btree_insert+0x104/0x2d0 [xfs]
>>   xfs_rmap_insert+0xc4/0x260 [xfs]
>>   xfs_rmap_map_shared+0x228/0x630 [xfs]
>>   xfs_rmap_finish_one+0x2d4/0x350 [xfs]
>>   xfs_rmap_update_finish_item+0x44/0xc0 [xfs]
>>   xfs_defer_finish_noroll+0x2e4/0x740 [xfs]
>>   __xfs_trans_commit+0x1f4/0x400 [xfs]
>>   xfs_reflink_remap_extent+0x2d8/0x650 [xfs]
>>   xfs_reflink_remap_blocks+0x154/0x320 [xfs]
>
> And this is the same case of iterating millions of extents without
> ever blocking on a lock, a buffer cache miss, transaction
> reservation, etc. IOWs, xfs_reflink_remap_blocks() is another high
> level extent iterator that needs the cond_resched() calls.

ok. I was thinking if we add this to xfs_defer_finish_noroll(), then we
can just take care of all such cases. But I agree with your point here.

I will test adding cond_resched() to xfs_relink_remap_blocks() loop
and will re-submit this patch.

>
> I can't wait for the kernel to always be pre-emptible so this whole
> class of nasty hacks can go away forever. But in the mean time, they
> still need to be placed appropriately.

Sure Dave. Thanks for the review!

-ritesh