Re: The right way to recover from md partition failure?

David Greaves <david@xxxxxxxxxxxx> · Mon, 30 Aug 2004 22:33:17 +0100

I think a better approach might be:

mdadm /dev/md1 -r /dev/hde3
dd if=/dev/hde3 of=/dev/null
check logs for nasty errors and only continue if there weren't any :)
mdadm /dev/md1 -a /dev/hde3

Having done this very thing this afternoon!!

If you have "some console messages about a bad block or something" then 
I'd make damn sure your disk is good before putting it back.

If you end up doing lots of retries during the resync and an error 
occurs on a remaining drive you'll be sorry!

In general a raid failure means you should suspect a disk failure.

I just wish Jeff G would get of his backside and make SMART work with 
libata - doesn't the man work on bank holidays? ;)

David

Guy wrote:

No need to copy, that's what md does.

Verify that the disk is not part of the array:
mdadm -D /dev/md1

I bet you will find the disk is there, but failed.
So, raidhotremove it, then raidhotadd it.

mdadm is the preferred tool.  The old raidtools are not supported.
For details:
man mdadm

You may need to install mdadm.

mdadm manage /dev/md1 -r /dev/hde3
mdadm manage /dev/md1 -a /dev/hde3

or short form:
mdadm /dev/md1 -r /dev/hde3
mdadm /dev/md1 -a /dev/hde3

It should start to re-sync.  Monitor the status with:
cat /proc/mdstat
and/or
mdadm -D /dev/md1

Guy

-----Original Message-----
From: linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of Jonathan Baker-Bates
Sent: Monday, August 30, 2004 3:39 PM
To: linux-raid@xxxxxxxxxxxxxxx
Subject: The right way to recover from md partition failure?

I've been reading various FAQs and HOWTOs, but for some reason can't really
get an answer to what I assume is a simple question about how best to get a
failed md RAID 1 partition back into an array.

After a power-outage, I see that cat /proc/mdstat shows:

Personalities : [raid1]
read_ahead 1024 sectors
Event: 3
md1 : active raid1 hdg3[1]
     178787264 blocks [2/1] [_U]

md0 : active raid1 hde2[0] hdg2[1]
     2048192 blocks [2/2] [UU]

md2 : active raid1 hde1[0] hdg1[1]
     104320 blocks [2/2] [UU]

unused devices: <none>

So it looks like /dev/hde3 is down. I'm not sure exactly why this is, but
there were some console messages about a bad block or something. So,
assuming hdg3 is OK (which it seems to be) can I just do the following?

Copy good partition to bad one:

dd if=/dev/hdg3 of=/dev/hde3

Add the resulting copy to the raid:

raidhotadd /dev/md1 /dev/hde3

fsck /dev/md1 to make sure all is well.

Is there a better way?

Jonathan

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html