Hi Kuai, the madm --assemble command also hangs in the kernel. It never completes. root 142 112 1 19:01 tty1 00:00:00 mdadm --assemble /dev/md0 /dev/ubdb /dev/ubdc /dev/ubdd /dev/ubde --backup-file mdadm_raid6_backup.md0 --invalid-backup root 145 2 0 19:01 ? 00:00:00 [md0_raid6] [root@LXCNAME ~]# cat /proc/142/stack [<0>] __switch_to+0x50/0x7f [<0>] __schedule+0x39c/0x3dd [<0>] schedule+0x78/0xb9 [<0>] mddev_suspend+0x10b/0x1e8 [<0>] suspend_lo_store+0x72/0xbb [<0>] md_attr_store+0x6c/0x8d [<0>] sysfs_kf_write+0x34/0x37 [<0>] kernfs_fop_write_iter+0x167/0x1d0 [<0>] new_sync_write+0x68/0xd8 [<0>] vfs_write+0xe7/0x12b [<0>] ksys_write+0x6d/0xa6 [<0>] sys_write+0x10/0x12 [<0>] handle_syscall+0x81/0xb1 [<0>] userspace+0x3db/0x598 [<0>] fork_handler+0x94/0x96 [root@LXCNAME ~]# cat /proc/145/stack [<0>] __switch_to+0x50/0x7f [<0>] __schedule+0x39c/0x3dd [<0>] schedule+0x78/0xb9 [<0>] schedule_timeout+0xd2/0xfb [<0>] md_thread+0x12c/0x18a [<0>] kthread+0x11d/0x122 [<0>] new_thread_handler+0x81/0xb2 I have had one case in which mdadm didn't hang and in which the reshape continued. Sadly, I was using sparse overlay files and the filesystem could not handle the full 4x 4TB. I had to terminate the reshape. Best regards, Johan On Thu, May 4, 2023 at 1:41 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2023/04/24 3:09, Jove 写道: > > Hi, > > > > I've added two drives to my raid5 array and tried to migrate > > it to raid6 with the following command: > > > > mdadm --grow /dev/md0 --raid-devices 4 --level 6 > > --backup-file=/root/mdadm_raid6_backup.md > > > > This may have been my first mistake, as there are only 5 > > drives. it should have been --raid-devices 3, I think. > > > > As soon as I started this grow, the filesystems went > > unavailable. All processes trying to access files on it hung. > > I searched the web which said a reboot during a rebuild > > was not problematic if things shut down cleanly, so I > > rebooted. The reboot hung too. The drive activity > > continued so I let it run overnight. I did wake up to a > > rebooted system in emergency mode as it could not > > mount all the partitions on the raid array. > > > > The OS tried to reassemble the array and succeeded. > > However the udev processes that try to create the dev > > entries hang. > > > > I went back to Google and found out how i could reboot > > my system without this automatic assemble. > > I tried reassembling the array with: > > > > mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0 /dev/md0 > > > > This failed with: > > No backup metadata on mdadm_raid6_backup.md0 > > Failed to find final backup of critical section. > > Failed to restore critical section for reshape, sorry. > > > > I tried again wtih: > > > > mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0 > > --invalid-backup /dev/md0 > > > > Rhis said in addition to the lines above: > > > > continuying without restoring backup > > > > This seemed to have succeeded in reassembling the > > array but it also hangs indefinitely. > > > > /proc/mdstat now shows: > > > > md0 : active (read-only) raid6 sdc1[0] sde[4](S) sdf[5] sdd1[3] sdg1[1] > > 7813771264 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_] > > bitmap: 1/30 pages [4KB], 65536KB chunk > > Read only can't continue reshape progress, see details in > md_check_recovery(), reshape can only start if md_is_rdwr(mddev) pass. > Do you know why this array is read-only? > > > > > Again the udev processes trying to access this device hung indefinitely > > > > Eventually, the kernel dumps this in my journal: > > > > Apr 23 19:17:22 atom kernel: task:systemd-udevd state:D stack: 0 > > pid: 8121 ppid: 706 flags:0x00000006 > > Apr 23 19:17:22 atom kernel: Call Trace: > > Apr 23 19:17:22 atom kernel: <TASK> > > Apr 23 19:17:22 atom kernel: __schedule+0x20a/0x550 > > Apr 23 19:17:22 atom kernel: schedule+0x5a/0xc0 > > Apr 23 19:17:22 atom kernel: schedule_timeout+0x11f/0x160 > > Apr 23 19:17:22 atom kernel: ? make_stripe_request+0x284/0x490 [raid456] > > Apr 23 19:17:22 atom kernel: wait_woken+0x50/0x70 > > Looks like this normal io is waiting for reshape to be done, that's why > it hanged indefinitely. > > This really is a kernel bug, perhaps it can be bypassed if reshape can > be done, hopefully automatically if this array can be read/write. Noted > never echo reshape to sync_action, this will corrupt data in your case. > > Thanks, > Kuai >