Hi, I?m somewhat new to Linux and mdadm although I?ve certainly learnt a lot over the last 24 hours. I have a SuperMicro server running CentOS 7 (3.10.0-1160.11.1.e17.x86_64) with version 4.1 ? 2018-10-01 of mdadm with that was happily running with 30 8TB disk in a RAID6 configuration. (It also has boot and root on a RAID1 array ? the RAID6 array being solely for data.) It was however starting to run out of space and I investigated adding more drives to the array (it can hold a total of 45 drives). Since this device is no longer under support, obtaining the same drives as it already contained wasn?t an option and the supplier couldn?t guarantee that they could supply compatible drives. We did come to an arrangement where I would try one drive and, if it didn?t work, I could return any unopened units. I spent ages ensuring that the ones he?d suggested were as compatible as possible and I based the specs of the existing drives off the invoice for the entire system. This turned out to be a mistake as the invoice stated they were 512e drives but, as I discovered after the new drives had arrived and I was doing a final check the existing were actually 4096k drives. Of course the new drives were 512e. Bother! After a lot more reading I found out that it might be possible to reformat the new drives from 512e to 4096k using sg_format. I installed the test drive and proceeded to see if it was possible to format them to 4096k using the command sg_format ?size=4096 /dev/sd<x>. All was proceeding smoothly when my ssh session terminated due a faulty docking station killing my Ethernet connection. So I logged onto the console and restarted the sg_format which completed OK, sort of ? it did convert the disk to 4096k but it did throw an I/O error or two but they didn?t seem too concerning and I figured, if there was a problem, it would show up in the next couple of steps. I?ve since discovered the dmesg log and that indicated that there were significantly more I/O errors than I thought. Anyway, since sg_format appeared to complete OK, I moved onto the next stage which was to partition the disk with the following commands parted -a optimal /dev/sd<x> (parted) mklabel msdos (parted) mkpart primary 2048s 100% (need to check that the start is correct) (parted) align-check optimal 1 (verify alignment of partition 1) (parted) set 1 raid on (set the FLAG to RAID) (parted) print Unfortunately, I don?t have the results of the print command as my laptop unexpectedly shut down over night (it hasn?t been a good weekend) but the partitioning appeared to complete without incident. I then added the new disk to the array: mdadm --add /dev/md125 /dev/sd<x> And it completed without any problems. I then proceeded to grow the array: mdadm --grow --raid-devices=31 --backup-file=/grow_md125.bak /dev/md125 I monitored this with cat /proc/mdstat and it showed that it was reshaping but the speed was 0K/sec and the reshape didn?t progress from 0%. #cat /proc/mdstat produced: Personalities : [raid1] [raid6] [raid5] [raid4] md125 : active raid6 sdab1[30] sdw1[26] sdc1[6] sdm1[16] sdi1[12] sdz1[29] sdh1[11] sdg1[10] sds1[22] sdf1[9] sdq1[20] sdaa1[1] sdo1[18] sdu1[24] sdb1[5] sdae1[4] sdl1[15] sdj1[13] sdn1[17] sdp1[19] sdv1[25] sde1[8] sdd1[7] sdr1[21] sdt1[23] sdx1[27] sdad1[3] sdac1[2] sdy1[28] sda1[0] sdk1[14] 218789036032 blocks super 1.2 level 6, 512k chunk, algorithm 2 [31/31] [UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] [>....................] reshape = 0.0% (1/7813894144) finish=328606806584.3min speed=0K/sec bitmap: 0/59 pages [0KB], 65536KB chunk md126 : active raid1 sdaf1[0] sdag1[1] 100554752 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md127 : active raid1 sdaf3[0] sdag2[1] 976832 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: <none> # mdadm --detail /dev/md125 produced: /dev/md125: Version : 1.2 Creation Time : Wed Sep 13 15:09:40 2017 Raid Level : raid6 Array Size : 218789036032 (203.76 TiB 224.04 TB) Used Dev Size : 7813894144 (7.28 TiB 8.00 TB) Raid Devices : 31 Total Devices : 31 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sun May 8 00:47:35 2022 State : clean, reshaping Active Devices : 31 Working Devices : 31 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 512K Consistency Policy : bitmap Reshape Status : 0% complete Delta Devices : 1, (30->31) Name : localhost.localdomain:SW-RAID6 UUID : f9b65f55:5f257add:1140ccc0:46ca6c19 Events : 1053617 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 65 161 1 active sync /dev/sdaa1 2 65 193 2 active sync /dev/sdac1 3 65 209 3 active sync /dev/sdad1 4 65 225 4 active sync /dev/sdae1 5 8 17 5 active sync /dev/sdb1 6 8 33 6 active sync /dev/sdc1 7 8 49 7 active sync /dev/sdd1 8 8 65 8 active sync /dev/sde1 9 8 81 9 active sync /dev/sdf1 10 8 97 10 active sync /dev/sdg1 11 8 113 11 active sync /dev/sdh1 12 8 129 12 active sync /dev/sdi1 13 8 145 13 active sync /dev/sdj1 14 8 161 14 active sync /dev/sdk1 15 8 177 15 active sync /dev/sdl1 16 8 193 16 active sync /dev/sdm1 17 8 209 17 active sync /dev/sdn1 18 8 225 18 active sync /dev/sdo1 19 8 241 19 active sync /dev/sdp1 20 65 1 20 active sync /dev/sdq1 21 65 17 21 active sync /dev/sdr1 22 65 33 22 active sync /dev/sds1 23 65 49 23 active sync /dev/sdt1 24 65 65 24 active sync /dev/sdu1 25 65 81 25 active sync /dev/sdv1 26 65 97 26 active sync /dev/sdw1 27 65 113 27 active sync /dev/sdx1 28 65 129 28 active sync /dev/sdy1 29 65 145 29 active sync /dev/sdz1 30 65 177 30 active sync /dev/sdab1 NOTE: the new disk is /dev/sdab About 12 hours later, as the reshape hadn?t progressed from 0%, I looked at ways of aborting it, such as mdadm --stop /dev/md125 which didn't work so I ended up rebooting the server and this is where things really went pear-shaped. The server came up in emergency mode, which I found odd given that the boot and root should have been OK. I was able to log on as root OK but the RAID6 array ws stuck in the reshape state. I then tried mdadm --assemble --update=revert-reshape --backup-file=/grow_md125.bak --verbose --uuid= f9b65f55:5f257add:1140ccc0:46ca6c19 /dev/md125 and this produced: mdadm: No super block found on /dev/sde (Expected magic a92b4efc, got <varying numbers> mdadm: No RAID super block on /dev/sde . . mdadm: /dev/sde1 is identified as a member of /dev/md125, slot 6 . . mdadm: /dev/md125 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on /grow_md125.back mdadm: Failed to find backup of critical section mdadm: Failed to restore critical section for reshape, sorry. I've tried difference variations on this including mdadm --assemble --invalid-backup --force but I won't include all the different commands here because I'm having to type all this since I can't copy anything off the server while it's in Emergency Mode. I have also removed the suspect disk but this hasn't made any difference. But the closest I've come to fixing this is running mdadm /dev/md125 --assemble --invalid-backup --backup-file=/grow_md125.bak --verbose /dev/sdc1 /dev/sdd1 ....... /dev/sdaf1 and this produces: . . . mdadm: /dev/sdaf1 is identified as a member of /dev/md125, slot 4. mdadm: /dev/md125 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on /grow_md125.back mdadm: Failed to find backup of critical section mdadm: continuing without restoring backup mdadm: added /dev/sdac1 to /dev/md125 as 1 . . . mdadm: failed to RUN_ARRAY /dev/md125: Invalid argument dmesg has this information: md: md125 stopped. md/raid:md125: reshape_position too early for auto-recovery - aborting. md: pers->run() failed ... md: md125 stopped. If you?ve stuck with me and read all this way, thank you and I hope you can help me. Regards, Bob Brand