Well, not hearing any response I had to try something, I rebooted and the reshape initially picked up again. But after a couple of minutes, it hung again. This time I got the same dmesg messages about the reshape, but also a fsck hang and kworker as well. I'm not sure how kworker is related - maybe someone can provide some insight. [ 246.970484] INFO: task kworker/u32:6:106 blocked for more than 122 seconds. [ 246.970506] Tainted: G OE 6.9.3-060903-generic #202405300957 [ 246.970514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.970521] task:kworker/u32:6 state:D stack:0 pid:106 tgid:106 ppid:2 flags:0x00004000 [ 246.970536] Workqueue: writeback wb_workfn (flush-9:2) [ 246.970555] Call Trace: [ 246.970561] <TASK> [ 246.970570] __schedule+0x279/0x6a0 [ 246.970586] schedule+0x29/0xd0 [ 246.970597] wait_barrier.part.0+0x180/0x1e0 [raid10] [ 246.970624] ? __pfx_autoremove_wake_function+0x10/0x10 [ 246.970647] wait_barrier+0x70/0xc0 [raid10] [ 246.970667] regular_request_wait+0x42/0x1d0 [raid10] [ 246.970686] ? bio_associate_blkg_from_css+0xf8/0x330 [ 246.970696] ? __kmalloc+0x1c0/0x4e0 [ 246.970706] raid10_write_request+0x164/0x5f0 [raid10] [ 246.970725] ? r10bio_pool_alloc+0x28/0x40 [raid10] [ 246.970743] ? r10bio_pool_alloc+0x28/0x40 [raid10] [ 246.970763] raid10_make_request+0xea/0x1a0 [raid10] [ 246.970783] md_handle_request+0x15d/0x280 [ 246.970797] md_submit_bio+0x63/0xb0 [ 246.970807] __submit_bio+0xe7/0x1c0 [ 246.970815] __submit_bio_noacct+0x91/0x220 [ 246.970823] submit_bio_noacct_nocheck+0x205/0x240 [ 246.970832] submit_bio_noacct+0x162/0x5a0 [ 246.970840] submit_bio+0xb1/0x110 [ 246.970847] submit_bh_wbc+0x15e/0x190 [ 246.970855] __block_write_full_folio+0x1e3/0x420 [ 246.970864] ? __pfx_blkdev_get_block+0x10/0x10 [ 246.970873] ? __pfx_blkdev_get_block+0x10/0x10 [ 246.970881] block_write_full_folio+0x150/0x180 [ 246.970887] ? __pfx_blkdev_get_block+0x10/0x10 [ 246.970895] ? __pfx_blkdev_get_block+0x10/0x10 [ 246.970901] ? __pfx_block_write_full_folio+0x10/0x10 [ 246.970907] write_cache_pages+0x63/0xb0 [ 246.970918] blkdev_writepages+0x57/0x90 [ 246.970927] do_writepages+0x7e/0x270 [ 246.970936] ? update_sd_lb_stats.constprop.0+0x88/0x400 [ 246.970946] __writeback_single_inode+0x44/0x290 [ 246.970953] ? inode_to_bdi+0x3c/0x50 [ 246.970961] writeback_sb_inodes+0x227/0x530 [ 246.970977] __writeback_inodes_wb+0x54/0x100 [ 246.970984] ? queue_io+0x113/0x120 [ 246.970991] wb_writeback+0x28a/0x300 [ 246.970999] wb_do_writeback+0x223/0x2a0 [ 246.971008] wb_workfn+0x4c/0x150 [ 246.971015] process_one_work+0x18d/0x3f0 [ 246.971023] worker_thread+0x304/0x440 [ 246.971030] ? __pfx_worker_thread+0x10/0x10 [ 246.971036] kthread+0xe4/0x110 [ 246.971045] ? __pfx_kthread+0x10/0x10 [ 246.971053] ret_from_fork+0x47/0x70 [ 246.971061] ? __pfx_kthread+0x10/0x10 [ 246.971069] ret_from_fork_asm+0x1a/0x30 [ 246.971079] </TASK> [ 246.971093] INFO: task md2_reshape:263 blocked for more than 122 seconds. [ 246.971100] Tainted: G OE 6.9.3-060903-generic #202405300957 [ 246.971106] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.971110] task:md2_reshape state:D stack:0 pid:263 tgid:263 ppid:2 flags:0x00004000 [ 246.971121] Call Trace: [ 246.971124] <TASK> [ 246.971128] __schedule+0x279/0x6a0 [ 246.971140] schedule+0x29/0xd0 [ 246.971148] wait_barrier.part.0+0x180/0x1e0 [raid10] [ 246.971165] ? __pfx_autoremove_wake_function+0x10/0x10 [ 246.971175] wait_barrier+0x70/0xc0 [raid10] [ 246.971192] raid10_sync_request+0x177e/0x19e3 [raid10] [ 246.971210] ? __schedule+0x281/0x6a0 [ 246.971221] md_do_sync+0xa36/0x1390 [ 246.971229] ? __pfx_autoremove_wake_function+0x10/0x10 [ 246.971242] ? __pfx_md_thread+0x10/0x10 [ 246.971249] md_thread+0xa5/0x1a0 [ 246.971257] ? __pfx_md_thread+0x10/0x10 [ 246.971263] kthread+0xe4/0x110 [ 246.971271] ? __pfx_kthread+0x10/0x10 [ 246.971279] ret_from_fork+0x47/0x70 [ 246.971286] ? __pfx_kthread+0x10/0x10 [ 246.971294] ret_from_fork_asm+0x1a/0x30 [ 246.971304] </TASK> [ 246.971310] INFO: task fsck.ext4:800 blocked for more than 122 seconds. [ 246.971365] Tainted: G OE 6.9.3-060903-generic #202405300957 [ 246.971372] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 246.971376] task:fsck.ext4 state:D stack:0 pid:800 tgid:800 ppid:790 flags:0x00004002 [ 246.971386] Call Trace: [ 246.971389] <TASK> [ 246.971394] __schedule+0x279/0x6a0 [ 246.971405] schedule+0x29/0xd0 [ 246.971414] wait_barrier.part.0+0x180/0x1e0 [raid10] [ 246.971431] ? __pfx_autoremove_wake_function+0x10/0x10 [ 246.971441] wait_barrier+0x70/0xc0 [raid10] [ 246.971459] regular_request_wait+0x42/0x1d0 [raid10] [ 246.971475] ? __kmalloc+0x1c0/0x4e0 [ 246.971483] raid10_write_request+0x164/0x5f0 [raid10] [ 246.971500] ? r10bio_pool_alloc+0x28/0x40 [raid10] [ 246.971515] ? r10bio_pool_alloc+0x28/0x40 [raid10] [ 246.971533] raid10_make_request+0xea/0x1a0 [raid10] [ 246.971551] md_handle_request+0x15d/0x280 [ 246.971560] md_submit_bio+0x63/0xb0 [ 246.971568] __submit_bio+0xe7/0x1c0 [ 246.971576] __submit_bio_noacct+0x91/0x220 [ 246.971584] submit_bio_noacct_nocheck+0x205/0x240 [ 246.971594] submit_bio_noacct+0x162/0x5a0 [ 246.971602] submit_bio+0xb1/0x110 [ 246.971609] submit_bh_wbc+0x15e/0x190 [ 246.971617] __block_write_full_folio+0x1e3/0x420 [ 246.971626] ? __pfx_blkdev_get_block+0x10/0x10 [ 246.971634] ? __pfx_blkdev_get_block+0x10/0x10 [ 246.971642] block_write_full_folio+0x150/0x180 [ 246.971648] ? __pfx_blkdev_get_block+0x10/0x10 [ 246.971656] ? __pfx_blkdev_get_block+0x10/0x10 [ 246.971663] ? __pfx_block_write_full_folio+0x10/0x10 [ 246.971669] write_cache_pages+0x63/0xb0 [ 246.971679] blkdev_writepages+0x57/0x90 [ 246.971689] do_writepages+0x7e/0x270 [ 246.971700] filemap_fdatawrite_wbc+0x75/0xb0 [ 246.971707] __filemap_fdatawrite_range+0x6d/0xa0 [ 246.971723] file_write_and_wait_range+0x5d/0xc0 [ 246.971731] blkdev_fsync+0x39/0x70 [ 246.971739] vfs_fsync_range+0x4b/0xa0 [ 246.971748] ? __pfx_read_tsc+0x10/0x10 [ 246.971756] __x64_sys_fsync+0x3c/0x70 [ 246.971765] x64_sys_call+0x2485/0x25c0 [ 246.971773] do_syscall_64+0x7e/0x180 [ 246.971785] ? tick_program_event+0x43/0xa0 [ 246.971798] ? hrtimer_interrupt+0x121/0x250 [ 246.971808] ? irqentry_exit_to_user_mode+0x76/0x270 [ 246.971821] ? irqentry_exit+0x43/0x50 [ 246.971831] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ 246.971842] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 246.971852] RIP: 0033:0x70c85631ede4 [ 246.971883] RSP: 002b:00007ffed1aa0258 EFLAGS: 00000202 ORIG_RAX: 000000000000004a [ 246.971893] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000070c85631ede4 [ 246.971899] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 [ 246.971903] RBP: 00007ffed1aa0270 R08: 000059f82e125d80 R09: 0000000000000000 [ 246.971907] R10: 000059f82e128b74 R11: 0000000000000202 R12: 000059f82e125d80 [ 246.971911] R13: 00000000000002c2 R14: 0000000000000000 R15: 000059f82e128780 [ 246.971919] </TASK> Really could use some help here. I don't have any idea where to look for logs etc. that may provide some clues. Thanks, Bill On Wed, Jun 26, 2024 at 6:33 AM William Morgan <therealbrewer@xxxxxxxxx> wrote: > > Is --freeze-reshape of any use here? > > Obviously the reshape has crashed, I just want to know what is the > ideal way to resolve this. I would like to hear your opinions before > doing anything. > > Bill > > On Tue, Jun 25, 2024 at 5:18 PM William Morgan <therealbrewer@xxxxxxxxx> wrote: > > > > Additional info: > > > > bill@bill-desk:~$ sudo cat /proc/242508/stack > > [<0>] wait_barrier.part.0+0x180/0x1e0 [raid10] > > [<0>] wait_barrier+0x70/0xc0 [raid10] > > [<0>] raid10_sync_request+0x177e/0x19e3 [raid10] > > [<0>] md_do_sync+0xa36/0x1390 > > [<0>] md_thread+0xa5/0x1a0 > > [<0>] kthread+0xe4/0x110 > > [<0>] ret_from_fork+0x47/0x70 > > [<0>] ret_from_fork_asm+0x1a/0x30