On Sun, 8 Aug 2004, Robin Bowes wrote: > Hi, > > This question came up in another thread, but buried at the end so I > thought it would be worth pulling out and asking explicitly. > > I have a 6-disk RAID5 array made up of 6 x 250GB Maxtor SATA drives (5 + > 1 hot spare) > > Suppose one fails. What is the process I need to follow to replace the > faulty disk? This is what I did recently on a server with 4 disks on 2 SCSI busses (Dell 24xx box IIRC) /dev/sda failed. Each of the 4 disks is partitioned identially into 6 partitions, each partition being a slice of a RAID array. Removed failed device from arrays: raid-hot-remove /dev/md0 /dev/sda1 raid-hot-remove /dev/md1 /dev/sda2 raid-hot-remove /dev/md2 /dev/sda3 raid-hot-remove /dev/md3 /dev/sda5 raid-hot-remove /dev/md4 /dev/sda6 raid-hot-remove /dev/md5 /dev/sda7 Only one md device had actually failed, but it was neccessary to degrade all arrays the replce the drive. Remove failed device from kernel: echo "scsi remove-single-device 0 0 ? 0" > /proc/scsi/scsi The ? was 0 in this case. Physically unplug the drive from the system. Note: The system was live and running and serving files during this entire process... The Dell has 80pin SCA style connectors, so I guessed it would be OK. Dell has some weird active backplance that appears as a SCSI device that I'm sure you can do "stuff" with, but this is a stock 2.4.26 kernel and Debian Woody. Plug the new drive in. Tell the kernel about it: echo "scsi add-single-device 0 0 ? 0" > /proc/scsi/scsi Use cfdisk to partition it using one of the other disks as a reference. Add it back into the raid arrays: raid-hot-add /dev/md0 /dev/sda1 raid-hot-add /dev/md1 /dev/sda2 raid-hot-add /dev/md2 /dev/sda3 raid-hot-add /dev/md3 /dev/sda5 raid-hot-add /dev/md4 /dev/sda6 raid-hot-add /dev/md5 /dev/sda7 which starts the rebuild on each partition in-turn. Finally, re-run Lilo to put the boot blocks back on (/dev/sda is one of the boot disks) Later, at a quiet time, reboot the server to make sure it will boot OK! > Here's my best guess so far: > > (assume /dev/sdc has failed). > > Shutdown server. > Pull dead drive > Insert new drive > Boot up server > Create partition table on new drive (all my drives are partitioned identically): > # sfdisk -d /dev/sda | sfdisk /dev/sdc Hm. Never heard of sfdisk - thats handy to copy a partition table! > (Is it necessary to explicitly "remove" the failed device from the > arrays (before shutting down?) and to add it back in after replacing the > disk?) > > For example, would this work?: > > # mdadm /dev/md5 -f /dev/sdc2 -r /dev/sdc2 -a /dev/sdc2 Hm. madm. One of these days I'll get round to reading its man page ... Gordon - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html