On 9 October 2024 11:09:40 BST, Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: >On Sun, 06 Oct 2024 07:00:18 +0100 >19 Devices <19devices@xxxxxxxxx> wrote: > >> Hi, I have a 4 drive imsm RAID 5 array which is working fine. I want to >> remove one of the drives, sda, and replace it with a spare, sdc. From man >> mdadm I understand that add - fail - remove is the way to go but this does >> not work. >> >> Before: >> $ cat /proc/mdstat >> Personalities : [raid6] [raid5] [raid4] >> md124 : active raid5 sdd[3] sdb[2] sda[1] sde[0] >> 2831155200 blocks super external:/md126/0 level 5, >> 128k chunk, algorithm 0 [4/4] [UUUU] >> >> md125 : active raid5 sdd[3] sdb[2] sda[1] sde[0] >> 99116032 blocks super external:/md126/1 level 5, 1 >> 28k chunk, algorithm 0 [4/4] [UUUU] >> >> md126 : inactive sda[3](S) sdb[2](S) sdd[1](S) sde[0](S) >> 14681 blocks super external:imsm >> >> unused devices: <none> >> >> >> I can add (or add-spare) which increases the size of the container and though >> I can't see any spare drives listed by mdadm, it appears as SPARE DISK in the >> Intel option ROM after a reboot. >> >> $ sudo mdadm --zero-superblock /dev/sdc >> >> $ sudo mdadm /dev/md/imsm1 --add-spare /de >> v/sdc >> mdadm: added /dev/sdc >> >> $ cat /proc/mdstat >> Personalities : [raid6] [raid5] [raid4] >> md124 : active raid5 sdd[3] sdb[2] sda[1] sde[0] >> 2831155200 blocks super external:/md126/0 level 5, >> 128k chunk, algorithm 0 [4/4] [UUUU] >> >> md125 : active raid5 sdd[3] sdb[2] sda[1] sde[0] >> 99116032 blocks super external:/md126/1 level 5, 1 >> 28k chunk, algorithm 0 [4/4] [UUUU] >> >> md126 : inactive sdc[4](S) sda[3](S) sdb[2](S) sdd[1](S) sde[0](S) >> 15786 blocks super external:imsm >> >> unused devices: <none> >> $ >> >> >> No spare devices listed here: >> >> $ sudo mdadm -D /dev/md/imsm1 >> /dev/md/imsm1: >> Version : imsm >> Raid Level : container >> Total Devices : 5 >> >> Working Devices : 5 >> >> >> UUID : bdb7f495:21b8c189:e496c216:6f2d6c4c >> Member Arrays : /dev/md/md1_0 /dev/md/md0_0 >> >> Number Major Minor RaidDevice >> >> - 8 64 - /dev/sde >> - 8 32 - /dev/sdc >> - 8 0 - /dev/sda >> - 8 48 - /dev/sdd >> - 8 16 - /dev/sdb >> $ >> >Hello, > >I know. It is fine. From container point of view these all are spares. >Nobody ever complained about that so we did not fixed it :) >The most important is that all drives are here. > >To detect spares you must compare this list with list from #mdadm --detail >/dev/md124 (member array). Drives that are not used in member array are spares. >> >> Trying to remove sda fails. >> >> $ sudo mdadm --fail /dev/md126 /dev/sda >> mdadm: Cannot remove /dev/sda from /dev/md126, array will be failed. > >It might be an issue in mdadm, we added this and later we added fixes: > >Commit: >https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=fc6fd4063769f4194c3fb8f77b32b2819e140fb9 > >Fixes: >https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=b3e7b7eb1dfedd7cbd9a3800e884941f67d94c96 >https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=461fae7e7809670d286cc19aac5bfa861c29f93a > >but your release is mdadm-4.3, all fixes should be there. It might be a new bug. > >Try: >#mdadm -If sda >but please do not abuse it (just use it one time because it may fail your >array). According to mdstat it should be safe in this case. > >If you can do some investigation, I would be tankful, I expect issues >in enough() function. > >Thanks, >Mariusz > >> >> sda is 2TB, the others are 1TB - is that a problem? >> >> smartctl shows 2 drives don't support SCT and it's disabled on the other 3. >> >> There's a very similar question here from Edwin in 2017: >> https://unix.stackexchange.com/questions/372908/add-hot-spare-drive-to-intel-rst-onboard-raid#372920 >> >> The only reply points to an Intel doc which uses the standard command to add >> a drive but doesn't show the result. >> >> $ uname -a >> Linux Intel 6.9.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 26 >> May 2024 01:30:29 +0000 x86_64 GNU/Linux >> >> $ mdadm --version >> mdadm - v4.3 - 2024-02-15 >> > --------------------------------------- Thank you Mariusz, that (--incremental --fail) worked: # mdadm -If sda mdadm: set sda faulty in md124 mdadm: set sda faulty in md125 mdadm: hot removed sda from md126 # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md124 : active raid5 sdc[4] sdd[3] sdb[2] sde[0] 2831155200 blocks super external:/md126/0 level 5, 128k chunk, algorithm 0 [4/3] [UU_U] [>....................] recovery = 0.2% (2275456 /943718400) finish=222.5min speed=70515K/sec md125 : active raid5 sdc[4] sdd[3] sdb[2] sde[0] 99116032 blocks super external:/md126/1 level 5, 1 28k chunk, algorithm 0 [4/3] [UU_U] resync=DELAYED md126 : inactive sdc[4](S) sdb[2](S) sdd[1](S) sde[0](S) 10585 blocks super external:imsm unused devices: <none> # # journalctl -f kernel: md/raid:md124: Disk failure on sda, disabling device. kernel: md/raid:md124: Operation continuing on 3 devices. kernel: md/raid:md125: Disk failure on sda, disabling device. kernel: md/raid:md125: Operation continuing on 3 devices. kernel: md: recovery of RAID array md124 kernel: md: delaying recovery of md125 until md124 has finished (they share one or more physical units) mdadm[628]: mdadm: Fail event detected on md device /dev/md125, component device /dev/sda mdadm[628]: mdadm: RebuildStarted event detected on md device /dev/md124 Intel mdadm[628]: mdadm: Fail event detected on md device /dev/md124, component device /dev/sda --------------------------------------- ps. Belated thanks too for your solution to my previous problem here on 2021/08/02. That fix showed no sign it had succeeded until reboot but after that all was fine.