Hi Bhatnagar, Looks like this is a similar issue I saw before with fio test (buffered IO with 100 threads), it is also shows "ext4-rsv-conversion" work queue takes lots CPU and make journal update every stuck. It stuck at [<0>] do_get_write_access+0x291/0x350 [jbd2] [<0>] jbd2_journal_get_write_access+0x67/0x90 [jbd2] [<0>] __ext4_journal_get_write_access+0x44/0x90 [ext4] [<0>] ext4_reserve_inode_write+0x83/0xc0 [ext4] [<0>] __ext4_mark_inode_dirty+0x50/0x120 [ext4] [<0>] ext4_convert_unwritten_extents+0x179/0x220 [ext4] [<0>] ext4_convert_unwritten_io_end_vec+0x64/0xe0 [ext4] [<0>] ext4_do_flush_completed_IO.isra.0+0xf5/0x190 [ext4] [<0>] process_one_work+0x1b0/0x350 [<0>] worker_thread+0x49/0x310 [<0>] kthread+0x11b/0x140 [<0>] ret_from_fork+0x22/0x30 And show the lock related to &journal→j_state_lock-W: waittime-avg 178.75us, holdtime-avg 4.84 us. &journal→j_state_lock-R: waittime-avg 1269.72us, holdtime-avg 11.07 us. There is a patch and see if this is the same issue? this is not the finial patch since there may have some issue from Ted. I will forward that email to you in a different loop. I didn't continue on this patch that time since we thought is might not be the real case in RDS. diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 091db733834e..b3c7544798b8 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5212,7 +5213,9 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb) * concurrency isn't really necessary. Limit it to 1. */ EXT4_SB(sb)->rsv_conversion_wq = - alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM | WQ_UNBOUND, 1); + alloc_workqueue("ext4-rsv-conversion", + WQ_MEM_RECLAIM | WQ_UNBOUND | __WQ_ORDERED, + num_active_cpus() > 1 ? num_active_cpus() : 1); Thanks Davina -----Original Message----- From: Bhatnagar, Rishabh <risbhat@xxxxxxxxxx> Sent: Friday, August 18, 2023 4:59 AM To: Jan Kara <jack@xxxxxxx> Cc: Theodore Ts'o <tytso@xxxxxxx>; jack@xxxxxxxx; linux-ext4@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; Park, SeongJae <sjpark@xxxxxxxxxx> Subject: Re: Tasks stuck jbd2 for a long time On 8/17/23 3:49 AM, Jan Kara wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Wed 16-08-23 15:53:05, Bhatnagar, Rishabh wrote: >> I collected dump and looked at some processes that were stuck in >> uninterruptible sleep.These are from upstream stable tree: >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree >> /?h=linux-5.10.y >> (5.10.191) >> >> One of them is the journal thread that is waiting for some other >> thread to close transaction handle. >> >> PID: 10642 TASK: ffff9768823f4000 CPU: 37 COMMAND: "jbd2/md0-8" >> #0 [ffffbd6c40c17c60] __schedule+617 at ffffffffbb912df9 >> #1 [ffffbd6c40c17cf8] schedule+60 at ffffffffbb91330c >> #2 [ffffbd6c40c17d08] jbd2_journal_commit_transaction+877 at >> ffffffffc016b90d [jbd2] (/home/ec2-user/linux/fs/jbd2/commit.c:497) >> #3 [ffffbd6c40c17ea0] kjournald2+282 at ffffffffc01723ba [jbd2] >> (/home/ec2-user/linux/fs/jbd2/journal.c:214) >> #4 [ffffbd6c40c17f10] kthread+279 at ffffffffbb0b9167 >> #5 [ffffbd6c40c17f50] ret_from_fork+34 at ffffffffbb003802 > Yes, correct. This is waiting for transaction->t_updates to drop to 0. > >> One of threads that have started the handle and waiting for journal >> to commit and unlock the current transaction. This stack only shows >> ext4lazyinit but with lazyinit disabled we have seen other threads >> stuck in same place. >> >> PID: 10644 TASK: ffff976901010000 CPU: 37 COMMAND: "ext4lazyinit" >> #0 [ffffbd6c40c1fbe0] __schedule+617 at ffffffffbb912df9 >> #1 [ffffbd6c40c1fc78] schedule+60 at ffffffffbb91330c >> #2 [ffffbd6c40c1fc88] wait_transaction_locked+137 at >> ffffffffc0168089 [jbd2] (/home/ec2-user/linux/fs/jbd2/transaction.c:184) >> #3 [ffffbd6c40c1fcd8] add_transaction_credits+62 at >> ffffffffc016813e [jbd2] >> (/home/ec2-user/linux/fs/jbd2/transaction.c:241) >> #4 [ffffbd6c40c1fd30] start_this_handle+533 at ffffffffc0168615 >> [jbd2] >> (/home/ec2-user/linux/fs/jbd2/transaction.c:416) >> #5 [ffffbd6c40c1fdc0] jbd2__journal_start+244 at ffffffffc0168dc4 [jbd2] >> #6 [ffffbd6c40c1fe00] __ext4_journal_start_sb+250 at >> ffffffffc02ef65a [ext4] >> #7 [ffffbd6c40c1fe40] ext4_init_inode_table+190 at ffffffffc0302ace [ext4] >> #8 [ffffbd6c40c1feb0] ext4_lazyinit_thread+906 at ffffffffc033ec9a [ext4] >> #9 [ffffbd6c40c1ff10] kthread+279 at ffffffffbb0b9167 >> #10 [ffffbd6c40c1ff50] ret_from_fork+34 at ffffffffbb003802 > This thread actually didn't start a transaction. It is *trying* to > start a transaction but it has failed and we are now waiting for > transaction commit to proceed (i.e., for jbd2/md0-8 process). So this > isn't the process jbd2 is waiting for. > >> To replicate the download scenario i'm just using dd to copy random >> data to disk. I launch a bunch of threads and try to stress the >> system. Many of those threads seem to be stuck in >> balance_dirty_pages_ratelimited as can be seen below. >> >> PID: 10709 TASK: ffff9769016f8000 CPU: 25 COMMAND: "dd" >> #0 [ffffbd6c40dafa48] __schedule+617 at ffffffffbb912df9 >> #1 [ffffbd6c40dafae0] schedule+60 at ffffffffbb91330c >> #2 [ffffbd6c40dafaf0] schedule_timeout+570 at ffffffffbb916a7a >> #3 [ffffbd6c40dafb68] io_schedule_timeout+25 at ffffffffbb913619 >> ((inlined >> by) io_schedule_finish at /home/ec2-user/linux/kernel/sched/core.c:6274) >> #4 [ffffbd6c40dafb80] balance_dirty_pages+654 at ffffffffbb2367ce >> (/home/ec2-user/linux/mm/page-writeback.c:1799) >> #5 [ffffbd6c40dafcf0] balance_dirty_pages_ratelimited+763 at >> ffffffffbb23752b (/home/ec2-user/linux/mm/page-writeback.c:1926) >> #6 [ffffbd6c40dafd18] generic_perform_write+308 at ffffffffbb22af44 >> (/home/ec2-user/linux/mm/filemap.c:3370) >> #7 [ffffbd6c40dafd88] ext4_buffered_write_iter+161 at >> ffffffffc02fcba1 [ext4] (/home/ec2-user/linux/fs/ext4/file.c:273) >> #8 [ffffbd6c40dafdb8] ext4_file_write_iter+96 at ffffffffc02fccf0 [ext4] >> #9 [ffffbd6c40dafe40] new_sync_write+287 at ffffffffbb2e0c0f >> #10 [ffffbd6c40dafec8] vfs_write+481 at ffffffffbb2e3161 >> #11 [ffffbd6c40daff00] ksys_write+165 at ffffffffbb2e3385 >> #12 [ffffbd6c40daff40] do_syscall_64+51 at ffffffffbb906213 >> #13 [ffffbd6c40daff50] entry_SYSCALL_64_after_hwframe+103 at >> ffffffffbba000df > Yes, this is waiting for page writeback to reduce amount of dirty > pages in the pagecache. We are not holding transaction handle during > this wait so this is also not the task jbd2 is waiting for. > >> There are other dd threads that are trying to read and are handling >> page fault. These are in runnable state and not uninterruptible sleep. >> >> PID: 14581 TASK: ffff97c3cfdbc000 CPU: 29 COMMAND: "dd" >> #0 [ffffbd6c4a1d3598] __schedule+617 at ffffffffbb912df9 >> #1 [ffffbd6c4a1d3630] _cond_resched+38 at ffffffffbb9133e6 >> #2 [ffffbd6c4a1d3638] shrink_page_list+126 at ffffffffbb2412fe >> #3 [ffffbd6c4a1d36c8] shrink_inactive_list+478 at ffffffffbb24441e >> #4 [ffffbd6c4a1d3768] shrink_lruvec+957 at ffffffffbb244e3d >> #5 [ffffbd6c4a1d3870] shrink_node+552 at ffffffffbb2452a8 >> #6 [ffffbd6c4a1d38f0] do_try_to_free_pages+201 at ffffffffbb245829 >> #7 [ffffbd6c4a1d3940] try_to_free_pages+239 at ffffffffbb246c0f >> #8 [ffffbd6c4a1d39d8] __alloc_pages_slowpath.constprop.114+913 at >> ffffffffbb28d741 >> #9 [ffffbd6c4a1d3ab8] __alloc_pages_nodemask+679 at >> ffffffffbb28e2e7 >> #10 [ffffbd6c4a1d3b28] alloc_pages_vma+124 at ffffffffbb2a734c >> #11 [ffffbd6c4a1d3b68] handle_mm_fault+3999 at ffffffffbb26de2f >> #12 [ffffbd6c4a1d3c28] exc_page_fault+708 at ffffffffbb909c84 >> #13 [ffffbd6c4a1d3c80] asm_exc_page_fault+30 at ffffffffbba00b4e >> #14 [ffffbd6c4a1d3d30] copyout+28 at ffffffffbb5160bc >> #15 [ffffbd6c4a1d3d38] _copy_to_iter+158 at ffffffffbb5188de >> #16 [ffffbd6c4a1d3d98] get_random_bytes_user+136 at ffffffffbb644608 >> #17 [ffffbd6c4a1d3e48] new_sync_read+284 at ffffffffbb2e0a5c >> #18 [ffffbd6c4a1d3ed0] vfs_read+353 at ffffffffbb2e2f51 >> #19 [ffffbd6c4a1d3f00] ksys_read+165 at ffffffffbb2e3265 >> #20 [ffffbd6c4a1d3f40] do_syscall_64+51 at ffffffffbb906213 >> #21 [ffffbd6c4a1d3f50] entry_SYSCALL_64_after_hwframe+103 at >> ffffffffbba000df > This process is in direct reclaim trying to free more memory. It > doesn't have transaction handle started so jbd2 also isn't waiting for > this process. > >>>> Can low available memory be a reason for a thread to not be able to >>>> close the transaction handle for a long time? >>>> Maybe some writeback thread starts the handle but is not able to >>>> complete writeback? >>> Well, even that would be a bug but low memory conditions are >>> certainly some of less tested paths so it is possible there's a bug lurking there. >> Amongst the things we have tested 2 things seem to give good improvements. >> >> One is disabling journalling. We don't see any stuck tasks. System >> becomes slow but eventually recovers. But its not something we want to disable. >> >> Other is enabling swap memory. Adding some swap memory also avoids >> system going into low memory state and system doesn't freeze. > OK, these are just workarounds. The question really is which process > holds the transaction handle jbd2 thread is waiting for. It is none of > the processes you have shown above. Since you have the crashdump, you > can also search all the processes and find those which have non-zero > task->journal_info. And from these processes you can select those > task->where journal_info points to an object from jbd2_handle_cache > task->and then you > can verify whether the handles indeed point (through > handle->h_transaction) to the transaction jbd2 thread is trying to > commit. After you've identified such task it is interesting to see what is it doing... > Hi Jan I think I found the thread that is holding the transaction handle. It seems to be in runnable state though. It has the journal_info set to the journal handle that has the matching transaction as the journal's running transaction. Here is the associated stack trace. It is converting unwritten extents to extents. PID: 287 TASK: ffff976801890000 CPU: 20 COMMAND: "kworker/u96:35" #0 [ffffbd6c40b3f498] __schedule+617 at ffffffffbb912df9 #1 [ffffbd6c40b3f530] _cond_resched+38 at ffffffffbb9133e6 #2 [ffffbd6c40b3f538] shrink_lruvec+670 at ffffffffbb244d1e #3 [ffffbd6c40b3f640] _cond_resched+21 at ffffffffbb9133d5 #4 [ffffbd6c40b3f648] shrink_node+552 at ffffffffbb2452a8 #5 [ffffbd6c40b3f6c8] do_try_to_free_pages+201 at ffffffffbb245829 #6 [ffffbd6c40b3f718] try_to_free_pages+239 at ffffffffbb246c0f #7 [ffffbd6c40b3f7b0] __alloc_pages_slowpath.constprop.114+913 at ffffffffbb28d741 #8 [ffffbd6c40b3f890] __alloc_pages_nodemask+679 at ffffffffbb28e2e7 #9 [ffffbd6c40b3f900] allocate_slab+726 at ffffffffbb2b0886 #10 [ffffbd6c40b3f958] ___slab_alloc+1173 at ffffffffbb2b3ff5 #11 [ffffbd6c40b3f988] insert_revoke_hash+37 at ffffffffc016f435 [jbd2] (/home/ec2-user/linux/fs/jbd2/revoke.c:146) #12 [ffffbd6c40b3f9b8] kmem_cache_free+924 at ffffffffbb2b712c )(inlined by) slab_alloc at /home/ec2-user/linux/mm/slub.c:2904) #13 [ffffbd6c40b3fa18] insert_revoke_hash+37 at ffffffffc016f435 [jbd2] (/home/ec2-user/linux/fs/jbd2/revoke.c:146) #14 [ffffbd6c40b3fa40] kmem_cache_alloc+928 at ffffffffbb2b4590 (/home/ec2-user/linux/mm/slub.c:290) #15 [ffffbd6c40b3fa78] insert_revoke_hash+37 at ffffffffc016f435 [jbd2] (/home/ec2-user/linux/fs/jbd2/revoke.c:146) #16 [ffffbd6c40b3faa0] __ext4_forget+338 at ffffffffc02efb32 [ext4] (/home/ec2-user/linux/fs/ext4/ext4_jbd2.c:298) #17 [ffffbd6c40b3fae0] ext4_free_blocks+2437 at ffffffffc031fd55 [ext4] (/home/ec2-user/linux/fs/ext4/mballoc.c:5709 (discriminator 2)) #18 [ffffbd6c40b3fbb0] ext4_ext_handle_unwritten_extents+596 at ffffffffc02f56a4 [ext4] ((inlined by) ext4_ext_handle_unwritten_extents at /home/ec2-user/linux/fs/ext4/extents.c:3892) #19 [ffffbd6c40b3fc98] ext4_ext_map_blocks+1325 at ffffffffc02f710d [ext4] (/home/ec2-user/linux/fs/ext4/extents.c:4165) #20 [ffffbd6c40b3fd60] ext4_map_blocks+813 at ffffffffc030bd0d [ext4] (/home/ec2-user/linux/fs/ext4/inode.c:659) #21 [ffffbd6c40b3fdd0] ext4_convert_unwritten_extents+303 at ffffffffc02f8adf [ext4] (/home/ec2-user/linux/fs/ext4/extents.c:4810) #22 [ffffbd6c40b3fe28] ext4_convert_unwritten_io_end_vec+95 at ffffffffc02f8c5f [ext4] (/home/ec2-user/linux/fs/ext4/extents.c:4850) #23 [ffffbd6c40b3fe58] ext4_end_io_rsv_work+269 at ffffffffc032c3fd [ext4] ((inlined by) ext4_do_flush_completed_IO at /home/ec2-user/linux/fs/ext4/page-io.c:262) #24 [ffffbd6c40b3fe98] process_one_work+405 at ffffffffbb0b2725 #25 [ffffbd6c40b3fed8] worker_thread+48 at ffffffffbb0b2920 #26 [ffffbd6c40b3ff10] kthread+279 at ffffffffbb0b9167 #27 [ffffbd6c40b3ff50] ret_from_fork+34 at ffffffffbb003802 Thanks Rishabh