Hi, 在 2024/06/03 19:09, Darius Ski 写道:
Hi all, we have 5 device RAID5 array of SSD disks and yesterday ( during Debian monthly cron mdadm/checkarray check ) experienced lockup under light writes from our FileWrServer. Array HW was fine before, during and after, just the resync, our app threads were locked up completely. The system completed such check during first Sunday of April and had no troubles reading/writing 20+ terabytes of data during system prep. cat /proc/mdstat Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] md0 : active raid5 nvme8n1p1[5] nvme7n1p1[3] nvme0n1p1[0] nvme4n1p1[2] nvme2n1p1[1] 120022794240 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] bitmap: 18/224 pages [72KB], 65536KB chunk [Sun Jun 2 16:38:36 2024] INFO: task md0_raid5:1545 blocked for more than 120 seconds. [Sun Jun 2 16:38:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:38:36 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:38:36 2024] task:md0_raid5 state:D stack:0 pid:1545 tgid:1545 ppid:2 flags:0x00004000 [Sun Jun 2 16:38:36 2024] Call Trace: [Sun Jun 2 16:38:36 2024] <TASK> [Sun Jun 2 16:38:36 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:38:36 2024] schedule+0x27/0xc0 [Sun Jun 2 16:38:36 2024] raid5d+0x466/0x670 [Sun Jun 2 16:38:36 2024] ? __schedule+0x372/0x9a0 [Sun Jun 2 16:38:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:38:36 2024] ? super_90_load.part.0+0x340/0x340 [Sun Jun 2 16:38:36 2024] md_thread+0x94/0x120 [Sun Jun 2 16:38:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:38:36 2024] kthread+0xc3/0xf0 [Sun Jun 2 16:38:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:38:36 2024] ret_from_fork+0x2d/0x50 [Sun Jun 2 16:38:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:38:36 2024] ret_from_fork_asm+0x11/0x20 [Sun Jun 2 16:38:36 2024] </TASK> [Sun Jun 2 16:38:36 2024] INFO: task jbd2/md0p1-8:1963 blocked for more than 120 seconds. [Sun Jun 2 16:38:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:38:36 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:38:36 2024] task:jbd2/md0p1-8 state:D stack:0 pid:1963 tgid:1963 ppid:2 flags:0x00004000 [Sun Jun 2 16:38:36 2024] Call Trace: [Sun Jun 2 16:38:36 2024] <TASK> [Sun Jun 2 16:38:36 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:38:36 2024] ? __queue_work+0x19f/0x3b0 [Sun Jun 2 16:38:36 2024] schedule+0x27/0xc0 [Sun Jun 2 16:38:36 2024] md_write_start+0x10d/0x200 [Sun Jun 2 16:38:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:38:36 2024] raid5_make_request+0x8f/0x1220 [Sun Jun 2 16:38:36 2024] ? xas_store+0x2de/0x610 [Sun Jun 2 16:38:36 2024] ? xas_load+0x2c/0x40 [Sun Jun 2 16:38:36 2024] ? sched_feat_write+0x150/0x150 [Sun Jun 2 16:38:36 2024] ? alloc_buffer_head+0x1a/0x60 [Sun Jun 2 16:38:36 2024] ? filemap_get_entry+0x53/0x100 [Sun Jun 2 16:38:36 2024] md_handle_request+0x132/0x210 [Sun Jun 2 16:38:36 2024] ? __bio_split_to_limits+0x8c/0x250 [Sun Jun 2 16:38:36 2024] ? bio_split_to_limits+0x3d/0x60 [Sun Jun 2 16:38:36 2024] __submit_bio+0x46/0xd0 [Sun Jun 2 16:38:36 2024] submit_bio_noacct_nocheck+0x11b/0x330 [Sun Jun 2 16:38:36 2024] jbd2_journal_commit_transaction+0xc7a/0x1930 [Sun Jun 2 16:38:36 2024] ? __schedule+0x372/0x9a0 [Sun Jun 2 16:38:36 2024] kjournald2+0x95/0x220 [Sun Jun 2 16:38:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:38:36 2024] ? jbd2_fc_wait_bufs+0x90/0x90 [Sun Jun 2 16:38:36 2024] kthread+0xc3/0xf0 [Sun Jun 2 16:38:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:38:36 2024] ret_from_fork+0x2d/0x50 [Sun Jun 2 16:38:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:38:36 2024] ret_from_fork_asm+0x11/0x20 [Sun Jun 2 16:38:36 2024] </TASK> [Sun Jun 2 16:38:36 2024] INFO: task FileWrServerEve:2020854 blocked for more than 120 seconds. [Sun Jun 2 16:38:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:38:36 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:38:36 2024] task:FileWrServerEve state:D stack:0 pid:2020854 tgid:2020739 ppid:1 flags:0x00000002 [Sun Jun 2 16:38:36 2024] Call Trace: [Sun Jun 2 16:38:36 2024] <TASK> [Sun Jun 2 16:38:36 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:38:36 2024] ? ext4_getblk+0xb7/0x2c0 [Sun Jun 2 16:38:36 2024] schedule+0x27/0xc0 [Sun Jun 2 16:38:36 2024] io_schedule+0x42/0x70 [Sun Jun 2 16:38:36 2024] bit_wait_io+0xd/0x60 [Sun Jun 2 16:38:36 2024] __wait_on_bit+0x48/0x140 [Sun Jun 2 16:38:36 2024] ? bit_wait+0x60/0x60 [Sun Jun 2 16:38:36 2024] out_of_line_wait_on_bit+0x81/0x90 [Sun Jun 2 16:38:36 2024] ? pick_next_task_stop+0x70/0x70 [Sun Jun 2 16:38:36 2024] do_get_write_access+0x235/0x3a0 [Sun Jun 2 16:38:36 2024] jbd2_journal_get_write_access+0x88/0xb0 [Sun Jun 2 16:38:36 2024] __ext4_journal_get_write_access+0x3e/0x160 [Sun Jun 2 16:38:36 2024] __ext4_new_inode+0x5a3/0x1730 [Sun Jun 2 16:38:36 2024] ext4_create+0xe9/0x1a0 [Sun Jun 2 16:38:36 2024] path_openat+0xdca/0x1080 [Sun Jun 2 16:38:36 2024] do_filp_open+0xa1/0x130 [Sun Jun 2 16:38:36 2024] do_sys_openat2+0x74/0xa0 [Sun Jun 2 16:38:36 2024] __x64_sys_openat+0x5c/0x70 [Sun Jun 2 16:38:36 2024] do_syscall_64+0x3a/0xc0 [Sun Jun 2 16:38:36 2024] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [Sun Jun 2 16:38:36 2024] RIP: 0033:0x7f6b89289f80 [Sun Jun 2 16:38:36 2024] RSP: 002b:00007f6b5962d540 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [Sun Jun 2 16:38:36 2024] RAX: ffffffffffffffda RBX: 00000000000000c1 RCX: 00007f6b89289f80 [Sun Jun 2 16:38:36 2024] RDX: 00000000000000c1 RSI: 00007f6a6c5d1ee0 RDI: 00000000ffffff9c [Sun Jun 2 16:38:36 2024] RBP: 00007f6a6c5d1ee0 R08: 0000000000000000 R09: 00000000fb82ea15 [Sun Jun 2 16:38:36 2024] R10: 00000000000081a4 R11: 0000000000000293 R12: 00000000000081a4 [Sun Jun 2 16:38:36 2024] R13: 00000000000000c1 R14: 00007f6a6c5d1ee0 R15: 00007f6b80ec9080 [Sun Jun 2 16:38:36 2024] </TASK> [Sun Jun 2 16:38:36 2024] INFO: task md0_resync:716263 blocked for more than 120 seconds. [Sun Jun 2 16:38:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:38:36 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:38:36 2024] task:md0_resync state:D stack:0 pid:716263 tgid:716263 ppid:2 flags:0x00004000 [Sun Jun 2 16:38:36 2024] Call Trace: [Sun Jun 2 16:38:36 2024] <TASK> [Sun Jun 2 16:38:36 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:38:36 2024] schedule+0x27/0xc0 [Sun Jun 2 16:38:36 2024] raid5_get_active_stripe+0x1f9/0x4c0 [Sun Jun 2 16:38:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:38:36 2024] raid5_sync_request+0x34f/0x370 [Sun Jun 2 16:38:36 2024] ? is_mddev_idle+0xe7/0x150 [Sun Jun 2 16:38:36 2024] md_do_sync+0x668/0xfa0 [Sun Jun 2 16:38:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:38:36 2024] ? super_90_load.part.0+0x340/0x340 [Sun Jun 2 16:38:36 2024] md_thread+0x94/0x120 [Sun Jun 2 16:38:36 2024] kthread+0xc3/0xf0 [Sun Jun 2 16:38:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:38:36 2024] ret_from_fork+0x2d/0x50 [Sun Jun 2 16:38:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:38:36 2024] ret_from_fork_asm+0x11/0x20 [Sun Jun 2 16:38:36 2024] </TASK> [Sun Jun 2 16:38:36 2024] INFO: task kworker/u385:4:1561579 blocked for more than 120 seconds. [Sun Jun 2 16:38:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:38:36 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:38:36 2024] task:kworker/u385:4 state:D stack:0 pid:1561579 tgid:1561579 ppid:2 flags:0x00004000 [Sun Jun 2 16:38:36 2024] Workqueue: writeback wb_workfn (flush-9:0) [Sun Jun 2 16:38:36 2024] Call Trace: [Sun Jun 2 16:38:36 2024] <TASK> [Sun Jun 2 16:38:36 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:38:36 2024] schedule+0x27/0xc0 [Sun Jun 2 16:38:36 2024] io_schedule+0x42/0x70 [Sun Jun 2 16:38:36 2024] bit_wait_io+0xd/0x60 [Sun Jun 2 16:38:36 2024] __wait_on_bit+0x48/0x140 [Sun Jun 2 16:38:36 2024] ? bit_wait+0x60/0x60 [Sun Jun 2 16:38:36 2024] out_of_line_wait_on_bit+0x81/0x90 [Sun Jun 2 16:38:36 2024] ? pick_next_task_stop+0x70/0x70 [Sun Jun 2 16:38:36 2024] do_get_write_access+0x235/0x3a0 [Sun Jun 2 16:38:36 2024] jbd2_journal_get_write_access+0x88/0xb0 [Sun Jun 2 16:38:36 2024] __ext4_journal_get_write_access+0x3e/0x160 [Sun Jun 2 16:38:36 2024] ext4_mb_mark_context+0x93/0x390 [Sun Jun 2 16:38:36 2024] ext4_mb_mark_diskspace_used+0xba/0x190 [Sun Jun 2 16:38:36 2024] ext4_mb_new_blocks+0x13f/0xdc0 [Sun Jun 2 16:38:36 2024] ? ext4_find_extent+0x330/0x420 [Sun Jun 2 16:38:36 2024] ? ext4_find_extent+0x3bf/0x420 [Sun Jun 2 16:38:36 2024] ? release_pages+0x141/0x3b0 [Sun Jun 2 16:38:36 2024] ext4_ext_map_blocks+0x35b/0x1790 [Sun Jun 2 16:38:36 2024] ? release_pages+0x141/0x3b0 [Sun Jun 2 16:38:36 2024] ? filemap_get_folios_tag+0x6c/0x1b0 [Sun Jun 2 16:38:36 2024] ? mpage_prepare_extent_to_map+0x439/0x470 [Sun Jun 2 16:38:36 2024] ext4_map_blocks+0x164/0x5d0 [Sun Jun 2 16:38:36 2024] ext4_do_writepages+0x694/0xba0 [Sun Jun 2 16:38:36 2024] ? iommu_map_sg+0x113/0x1b0 [Sun Jun 2 16:38:36 2024] ext4_writepages+0x87/0x120 [Sun Jun 2 16:38:36 2024] do_writepages+0xac/0x160 [Sun Jun 2 16:38:36 2024] ? wb_calc_thresh+0x41/0x50 [Sun Jun 2 16:38:36 2024] __writeback_single_inode+0x39/0x2b0 [Sun Jun 2 16:38:36 2024] writeback_sb_inodes+0x1ab/0x430 [Sun Jun 2 16:38:36 2024] __writeback_inodes_wb+0x4c/0xe0 [Sun Jun 2 16:38:36 2024] wb_writeback+0x1fd/0x250 [Sun Jun 2 16:38:36 2024] wb_workfn+0x23d/0x400 [Sun Jun 2 16:38:36 2024] process_one_work+0x13f/0x2c0 [Sun Jun 2 16:38:36 2024] worker_thread+0x26d/0x390 [Sun Jun 2 16:38:36 2024] ? rescuer_thread+0x380/0x380 [Sun Jun 2 16:38:36 2024] kthread+0xc3/0xf0 [Sun Jun 2 16:38:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:38:36 2024] ret_from_fork+0x2d/0x50 [Sun Jun 2 16:38:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:38:36 2024] ret_from_fork_asm+0x11/0x20 [Sun Jun 2 16:38:36 2024] </TASK> [Sun Jun 2 16:40:36 2024] INFO: task md0_raid5:1545 blocked for more than 241 seconds. [Sun Jun 2 16:40:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:40:36 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:40:36 2024] task:md0_raid5 state:D stack:0 pid:1545 tgid:1545 ppid:2 flags:0x00004000 [Sun Jun 2 16:40:36 2024] Call Trace: [Sun Jun 2 16:40:36 2024] <TASK> [Sun Jun 2 16:40:36 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:40:36 2024] schedule+0x27/0xc0 [Sun Jun 2 16:40:36 2024] raid5d+0x466/0x670 [Sun Jun 2 16:40:36 2024] ? __schedule+0x372/0x9a0 [Sun Jun 2 16:40:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:40:36 2024] ? super_90_load.part.0+0x340/0x340 [Sun Jun 2 16:40:36 2024] md_thread+0x94/0x120 [Sun Jun 2 16:40:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:40:36 2024] kthread+0xc3/0xf0 [Sun Jun 2 16:40:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:40:36 2024] ret_from_fork+0x2d/0x50 [Sun Jun 2 16:40:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:40:36 2024] ret_from_fork_asm+0x11/0x20 [Sun Jun 2 16:40:36 2024] </TASK> [Sun Jun 2 16:40:36 2024] INFO: task jbd2/md0p1-8:1963 blocked for more than 241 seconds. [Sun Jun 2 16:40:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:40:36 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:40:36 2024] task:jbd2/md0p1-8 state:D stack:0 pid:1963 tgid:1963 ppid:2 flags:0x00004000 [Sun Jun 2 16:40:36 2024] Call Trace: [Sun Jun 2 16:40:36 2024] <TASK> [Sun Jun 2 16:40:36 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:40:36 2024] ? __queue_work+0x19f/0x3b0 [Sun Jun 2 16:40:36 2024] schedule+0x27/0xc0 [Sun Jun 2 16:40:36 2024] md_write_start+0x10d/0x200 [Sun Jun 2 16:40:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:40:36 2024] raid5_make_request+0x8f/0x1220 [Sun Jun 2 16:40:36 2024] ? xas_store+0x2de/0x610 [Sun Jun 2 16:40:36 2024] ? xas_load+0x2c/0x40 [Sun Jun 2 16:40:36 2024] ? sched_feat_write+0x150/0x150 [Sun Jun 2 16:40:36 2024] ? alloc_buffer_head+0x1a/0x60 [Sun Jun 2 16:40:36 2024] ? filemap_get_entry+0x53/0x100 [Sun Jun 2 16:40:36 2024] md_handle_request+0x132/0x210 [Sun Jun 2 16:40:36 2024] ? __bio_split_to_limits+0x8c/0x250 [Sun Jun 2 16:40:36 2024] ? bio_split_to_limits+0x3d/0x60 [Sun Jun 2 16:40:36 2024] __submit_bio+0x46/0xd0 [Sun Jun 2 16:40:36 2024] submit_bio_noacct_nocheck+0x11b/0x330 [Sun Jun 2 16:40:36 2024] jbd2_journal_commit_transaction+0xc7a/0x1930 [Sun Jun 2 16:40:36 2024] ? __schedule+0x372/0x9a0 [Sun Jun 2 16:40:36 2024] kjournald2+0x95/0x220 [Sun Jun 2 16:40:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:40:36 2024] ? jbd2_fc_wait_bufs+0x90/0x90 [Sun Jun 2 16:40:36 2024] kthread+0xc3/0xf0 [Sun Jun 2 16:40:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:40:36 2024] ret_from_fork+0x2d/0x50 [Sun Jun 2 16:40:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:40:36 2024] ret_from_fork_asm+0x11/0x20 [Sun Jun 2 16:40:36 2024] </TASK> [Sun Jun 2 16:40:36 2024] INFO: task FileWrServerEve:2020854 blocked for more than 241 seconds. [Sun Jun 2 16:40:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:40:36 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:40:36 2024] task:FileWrServerEve state:D stack:0 pid:2020854 tgid:2020739 ppid:1 flags:0x00000002 [Sun Jun 2 16:40:36 2024] Call Trace: [Sun Jun 2 16:40:36 2024] <TASK> [Sun Jun 2 16:40:36 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:40:36 2024] ? ext4_getblk+0xb7/0x2c0 [Sun Jun 2 16:40:36 2024] schedule+0x27/0xc0 [Sun Jun 2 16:40:36 2024] io_schedule+0x42/0x70 [Sun Jun 2 16:40:36 2024] bit_wait_io+0xd/0x60 [Sun Jun 2 16:40:36 2024] __wait_on_bit+0x48/0x140 [Sun Jun 2 16:40:36 2024] ? bit_wait+0x60/0x60 [Sun Jun 2 16:40:36 2024] out_of_line_wait_on_bit+0x81/0x90 [Sun Jun 2 16:40:36 2024] ? pick_next_task_stop+0x70/0x70 [Sun Jun 2 16:40:36 2024] do_get_write_access+0x235/0x3a0 [Sun Jun 2 16:40:36 2024] jbd2_journal_get_write_access+0x88/0xb0 [Sun Jun 2 16:40:36 2024] __ext4_journal_get_write_access+0x3e/0x160 [Sun Jun 2 16:40:36 2024] __ext4_new_inode+0x5a3/0x1730 [Sun Jun 2 16:40:36 2024] ext4_create+0xe9/0x1a0 [Sun Jun 2 16:40:36 2024] path_openat+0xdca/0x1080 [Sun Jun 2 16:40:36 2024] do_filp_open+0xa1/0x130 [Sun Jun 2 16:40:36 2024] do_sys_openat2+0x74/0xa0 [Sun Jun 2 16:40:36 2024] __x64_sys_openat+0x5c/0x70 [Sun Jun 2 16:40:36 2024] do_syscall_64+0x3a/0xc0 [Sun Jun 2 16:40:36 2024] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [Sun Jun 2 16:40:36 2024] RIP: 0033:0x7f6b89289f80 [Sun Jun 2 16:40:36 2024] RSP: 002b:00007f6b5962d540 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [Sun Jun 2 16:40:36 2024] RAX: ffffffffffffffda RBX: 00000000000000c1 RCX: 00007f6b89289f80 [Sun Jun 2 16:40:36 2024] RDX: 00000000000000c1 RSI: 00007f6a6c5d1ee0 RDI: 00000000ffffff9c [Sun Jun 2 16:40:36 2024] RBP: 00007f6a6c5d1ee0 R08: 0000000000000000 R09: 00000000fb82ea15 [Sun Jun 2 16:40:36 2024] R10: 00000000000081a4 R11: 0000000000000293 R12: 00000000000081a4 [Sun Jun 2 16:40:36 2024] R13: 00000000000000c1 R14: 00007f6a6c5d1ee0 R15: 00007f6b80ec9080 [Sun Jun 2 16:40:36 2024] </TASK> [Sun Jun 2 16:40:36 2024] INFO: task md0_resync:716263 blocked for more than 241 seconds. [Sun Jun 2 16:40:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:40:36 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:40:36 2024] task:md0_resync state:D stack:0 pid:716263 tgid:716263 ppid:2 flags:0x00004000 [Sun Jun 2 16:40:36 2024] Call Trace: [Sun Jun 2 16:40:36 2024] <TASK> [Sun Jun 2 16:40:36 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:40:36 2024] schedule+0x27/0xc0 [Sun Jun 2 16:40:36 2024] raid5_get_active_stripe+0x1f9/0x4c0 [Sun Jun 2 16:40:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:40:36 2024] raid5_sync_request+0x34f/0x370 [Sun Jun 2 16:40:36 2024] ? is_mddev_idle+0xe7/0x150 [Sun Jun 2 16:40:36 2024] md_do_sync+0x668/0xfa0 [Sun Jun 2 16:40:36 2024] ? cpuacct_stats_show+0x120/0x120 [Sun Jun 2 16:40:36 2024] ? super_90_load.part.0+0x340/0x340 [Sun Jun 2 16:40:36 2024] md_thread+0x94/0x120 [Sun Jun 2 16:40:36 2024] kthread+0xc3/0xf0 [Sun Jun 2 16:40:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:40:36 2024] ret_from_fork+0x2d/0x50 [Sun Jun 2 16:40:36 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:40:36 2024] ret_from_fork_asm+0x11/0x20 [Sun Jun 2 16:40:36 2024] </TASK> [Sun Jun 2 16:40:36 2024] INFO: task kworker/u385:4:1561579 blocked for more than 241 seconds. [Sun Jun 2 16:40:36 2024] Not tainted 6.8.7-custom #1 [Sun Jun 2 16:40:37 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sun Jun 2 16:40:37 2024] task:kworker/u385:4 state:D stack:0 pid:1561579 tgid:1561579 ppid:2 flags:0x00004000 [Sun Jun 2 16:40:37 2024] Workqueue: writeback wb_workfn (flush-9:0) [Sun Jun 2 16:40:37 2024] Call Trace: [Sun Jun 2 16:40:37 2024] <TASK> [Sun Jun 2 16:40:37 2024] __schedule+0x36a/0x9a0 [Sun Jun 2 16:40:37 2024] schedule+0x27/0xc0 [Sun Jun 2 16:40:37 2024] io_schedule+0x42/0x70 [Sun Jun 2 16:40:37 2024] bit_wait_io+0xd/0x60 [Sun Jun 2 16:40:37 2024] __wait_on_bit+0x48/0x140 [Sun Jun 2 16:40:37 2024] ? bit_wait+0x60/0x60 [Sun Jun 2 16:40:37 2024] out_of_line_wait_on_bit+0x81/0x90 [Sun Jun 2 16:40:37 2024] ? pick_next_task_stop+0x70/0x70 [Sun Jun 2 16:40:37 2024] do_get_write_access+0x235/0x3a0 [Sun Jun 2 16:40:37 2024] jbd2_journal_get_write_access+0x88/0xb0 [Sun Jun 2 16:40:37 2024] __ext4_journal_get_write_access+0x3e/0x160 [Sun Jun 2 16:40:37 2024] ext4_mb_mark_context+0x93/0x390 [Sun Jun 2 16:40:37 2024] ext4_mb_mark_diskspace_used+0xba/0x190 [Sun Jun 2 16:40:37 2024] ext4_mb_new_blocks+0x13f/0xdc0 [Sun Jun 2 16:40:37 2024] ? ext4_find_extent+0x330/0x420 [Sun Jun 2 16:40:37 2024] ? ext4_find_extent+0x3bf/0x420 [Sun Jun 2 16:40:37 2024] ? release_pages+0x141/0x3b0 [Sun Jun 2 16:40:37 2024] ext4_ext_map_blocks+0x35b/0x1790 [Sun Jun 2 16:40:37 2024] ? release_pages+0x141/0x3b0 [Sun Jun 2 16:40:37 2024] ? filemap_get_folios_tag+0x6c/0x1b0 [Sun Jun 2 16:40:37 2024] ? mpage_prepare_extent_to_map+0x439/0x470 [Sun Jun 2 16:40:37 2024] ext4_map_blocks+0x164/0x5d0 [Sun Jun 2 16:40:37 2024] ext4_do_writepages+0x694/0xba0 [Sun Jun 2 16:40:37 2024] ? iommu_map_sg+0x113/0x1b0 [Sun Jun 2 16:40:37 2024] ext4_writepages+0x87/0x120 [Sun Jun 2 16:40:37 2024] do_writepages+0xac/0x160 [Sun Jun 2 16:40:37 2024] ? wb_calc_thresh+0x41/0x50 [Sun Jun 2 16:40:37 2024] __writeback_single_inode+0x39/0x2b0 [Sun Jun 2 16:40:37 2024] writeback_sb_inodes+0x1ab/0x430 [Sun Jun 2 16:40:37 2024] __writeback_inodes_wb+0x4c/0xe0 [Sun Jun 2 16:40:37 2024] wb_writeback+0x1fd/0x250 [Sun Jun 2 16:40:37 2024] wb_workfn+0x23d/0x400 [Sun Jun 2 16:40:37 2024] process_one_work+0x13f/0x2c0 [Sun Jun 2 16:40:37 2024] worker_thread+0x26d/0x390 [Sun Jun 2 16:40:37 2024] ? rescuer_thread+0x380/0x380 [Sun Jun 2 16:40:37 2024] kthread+0xc3/0xf0 [Sun Jun 2 16:40:37 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:40:37 2024] ret_from_fork+0x2d/0x50 [Sun Jun 2 16:40:37 2024] ? kthread_complete_and_exit+0x20/0x20 [Sun Jun 2 16:40:37 2024] ret_from_fork_asm+0x11/0x20 [Sun Jun 2 16:40:37 2024] </TASK> [Sun Jun 2 16:40:37 2024] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings Anyone got insights at what happened here? The write workload from our app is minor during Sundays, maybe several files per minute and several megabytes in size.
This looks like reported problem: https://lore.kernel.org/all/3bb2549f-846a-4179-9e23-407235a06753@xxxxxxxx/ https://lore.kernel.org/all/20240123005700.9302-1-dan@xxxxxxxx/ And the fix patch is(merged in v6.10-rc1): https://lore.kernel.org/all/20240322081005.1112401-1-yukuai1@xxxxxxxxxxxxxxx/ Thanks, Kuai
Best regards, Darius Ski. P.S. Running custom build of 6.8.7 and from quick glance in mailing list did not see anything similar, sorry if it's dublicate and/or solved already. .