Hi, 在 2023/04/24 3:09, Jove 写道:
Hi, I've added two drives to my raid5 array and tried to migrate it to raid6 with the following command: mdadm --grow /dev/md0 --raid-devices 4 --level 6 --backup-file=/root/mdadm_raid6_backup.md This may have been my first mistake, as there are only 5 drives. it should have been --raid-devices 3, I think. As soon as I started this grow, the filesystems went unavailable. All processes trying to access files on it hung. I searched the web which said a reboot during a rebuild was not problematic if things shut down cleanly, so I rebooted. The reboot hung too. The drive activity continued so I let it run overnight. I did wake up to a rebooted system in emergency mode as it could not mount all the partitions on the raid array. The OS tried to reassemble the array and succeeded. However the udev processes that try to create the dev entries hang. I went back to Google and found out how i could reboot my system without this automatic assemble. I tried reassembling the array with: mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0 /dev/md0 This failed with: No backup metadata on mdadm_raid6_backup.md0 Failed to find final backup of critical section. Failed to restore critical section for reshape, sorry. I tried again wtih: mdadm --verbose --assemble --backup-file mdadm_raid6_backup.md0 --invalid-backup /dev/md0 Rhis said in addition to the lines above: continuying without restoring backup This seemed to have succeeded in reassembling the array but it also hangs indefinitely. /proc/mdstat now shows: md0 : active (read-only) raid6 sdc1[0] sde[4](S) sdf[5] sdd1[3] sdg1[1] 7813771264 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUU_] bitmap: 1/30 pages [4KB], 65536KB chunk
Read only can't continue reshape progress, see details in md_check_recovery(), reshape can only start if md_is_rdwr(mddev) pass. Do you know why this array is read-only?
Again the udev processes trying to access this device hung indefinitely Eventually, the kernel dumps this in my journal: Apr 23 19:17:22 atom kernel: task:systemd-udevd state:D stack: 0 pid: 8121 ppid: 706 flags:0x00000006 Apr 23 19:17:22 atom kernel: Call Trace: Apr 23 19:17:22 atom kernel: <TASK> Apr 23 19:17:22 atom kernel: __schedule+0x20a/0x550 Apr 23 19:17:22 atom kernel: schedule+0x5a/0xc0 Apr 23 19:17:22 atom kernel: schedule_timeout+0x11f/0x160 Apr 23 19:17:22 atom kernel: ? make_stripe_request+0x284/0x490 [raid456] Apr 23 19:17:22 atom kernel: wait_woken+0x50/0x70
Looks like this normal io is waiting for reshape to be done, that's why it hanged indefinitely. This really is a kernel bug, perhaps it can be bypassed if reshape can be done, hopefully automatically if this array can be read/write. Noted never echo reshape to sync_action, this will corrupt data in your case. Thanks, Kuai