RE: raid5 recover after a 2 disk failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



To clarify, all I want to do is temporarily mount the array so I can copy as much data off as possible, then blow the entire array away and find out if sdc really is bad, or its just a bad block or bad cable or whatever. I get the feeling that sdc is mostly fine, in which case I should be able to recover most of the data on the array.

Also, is it possible to set this disks read only so that mdadm won't write to them no matter what I do? That would make me feel a lot better when trying various options to force mdadm into mounting them.

Thanks, Frank


From: "frank jenkins" <fjenkins873@xxxxxxxxxxx>
To: linux-raid@xxxxxxxxxxxxxxx
Subject: raid5 recover after a 2 disk failure
Date: Sun, 17 Jun 2007 06:57:55 +0000

I have a 5 disk raid5 array that had a disk failure. I removed the disk, added a new one (and a spare), and recovery began. Halfway through recovery, a second disk failed.

However, while the first disk really was dead, the second seems to have been a transient error, as the smart data and disk testing seem to show the disk is fine.

The question is, how can I tell mdadm to unfail this second disk. From what I've found in the archives, I think I need to use the --force option, but I'm concern about getting device names in the wrong order (and totally destroying my array in the process), so thought I'd ask here first. Here is my /proc/mdstat when recovery initially began:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdc1[0](S) sdf1[5] sdb1[4] sda1[3] sde1[2] sdd1[1]
     976783616 blocks level 5, 32k chunk, algorithm 2 [5/4] [_UUUU]
[>....................] recovery = 0.0% (237952/244195904) finish=427.0min speed=9518K/sec

and here is my current mdstat:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sdc1[5](S) sdf1[6](S) sdb1[4] sda1[3] sde1[7](F) sdd1[1]
     976783616 blocks level 5, 32k chunk, algorithm 2 [5/3] [_U_UU]

sde is the disk that is now marked as failed, and which I would like to put back into service.


Also, what does the number in []'s mean after each device, and why did that number change on sdc, sde, and sdf?

Thanks, Frank

_________________________________________________________________
Get a preview of Live Earth, the hottest event this summer - only on MSN http://liveearth.msn.com?source=msntaglineliveearthhm

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

_________________________________________________________________
Get a preview of Live Earth, the hottest event this summer - only on MSN http://liveearth.msn.com?source=msntaglineliveearthhm

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux