Re: Maximizing failed disk replacement on a RAID5 array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Folks,

On Wed, Jun 8, 2011 at 10:21 AM, Brad Campbell
<lists2009@xxxxxxxxxxxxxxx> wrote:
>
> Best of luck, and let us know how you get on.

Just finished the process here. To summarize, seems I've got my array
back in a stable state.

What I did:

1) Got a good backup of all the data in the array (using "tar") to
   removable HDs, verified it (using md5sum), and then stored these
   HDs safely offline;

2) Unmounted the filesystem in the array;

3) inserted the replacement disk on a USB dock, partitioned it,
   then added it to the array ("mdadm --add");
    -> Verified (via "mdadm --detail") that the replacement disk was
       listed on the array as a "spare";

4) failed the bad disk in the array ("mdadm --fail")
   -> At that point, the array immediatelly started to resync into the
      replacement disk;

5) Monitored the resync process via "cat /proc/mdstat": it took
   roughly 11 hours (I guess because transfer speed to the replacement
   disk was limited by the USB ~40MB/s speed limit), but it signaled
   no errors;

6) Verified that the array was really synced ("mdadm --detail") and
   that there were indeed no errors during the resync (less
   /var/log/messages);

7) removed the bad disk logically from the array ("mdadm --remove");

8) shut down the machine (init 0);

9) removed the bad disk physically from the machine, ejected the
   replacement disk from the USB dock, and then installed the
   replacement disk inside the machine;

10) turned the system on: the OS booted, assembled the array and
    mounted the filesystem in it with no issues;

11) checked (using "md5sum -c" on the md5sum files generated during
    pass#1 above) that all that ON THE ARRAY was indeed correct, so
    in the end I didn't need to restore anything from backup.

Thanks for all the help, folks, and I pray we have the "hot-replace"
functionality implemented soon... it will make for much sounder sleep
the next time one of my disks fails... :-)

Cheers,
--
  Durval Menezes.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux