Re: Two Drive Failure on RAID-5

"Janos Haar" <janos.haar@xxxxxxxxxxxx> · Wed, 21 May 2008 22:47:40 +0200

----- Original Message ----- 
From: "David Greaves" <david@xxxxxxxxxxxx>
To: "Cry" <cry_regarder@xxxxxxxxx>
Cc: <linux-raid@xxxxxxxxxxxxxxx>
Sent: Wednesday, May 21, 2008 10:15 PM
Subject: Re: Two Drive Failure on RAID-5

Cry wrote:
David Greaves <david <at> dgreaves.com> writes:
Cry wrote:
ddrescue /dev/SOURCE /dev/TARGET /somewhere_safe/logfile

unless you've rebooted:
blockdev --setrw /dev/SOURCE
blockdev --setra  <saved readahead value> /dev/SOURCE

mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 
/dev/sdd1
/dev/sde1

cat /proc/mdstat will show the drive status
mdadm --detail /dev/md0
mdadm --examine /dev/sd[abcdef]1 [components]

I performed the above steps, however I used dd_rescue instead of 
ddrescue.
Similar software. I think dd_rescue is more 'scripted' and less 
maintained.

]# dd_rescue -l sda_rescue.log -o sda_rescue.bad -v /dev/sda /dev/sdg1

doh!!
You copied the disk (/dev/sda) into a partition (/dev/sdg1)...

dd_rescue: (info): /dev/sda (488386592.0k): EOF
Summary for /dev/sda -> /dev/sdg1:
dd_rescue: (info): ipos: 488386592.0k, opos: 488386592.0k,
  xferd: 488386592.0k
                   errs:    504, errxfer:       252.0k,
  succxfer: 488386336.0k
             +curr.rate:    47904kB/s, avg.rate:    14835kB/s,
  avg.load:  9.6%
So you lost 252k of data. There may be filesystem corruption, a file may 
be
corrupt or some blank diskspace may be even more blank. Almost impossible 
to tell.

The dd_rescue shows if the target device is full.
The errs number divisible by 8, i think its only bad sectors.

But let me note:
With the default -b 64k, dd_rescue sometimes drop the entire soft block area 
on the first error!
If you want more precise result, run it again with -b 4096 and -B 1024, and 
if you can, don't copy the drive to the partition! :-)

[aside: It would be nice if we could take the output from ddrescue and 
friends
to determine what the lost blocks map to via the md stripes.]

/dev/sdg1 is my replacement drive (750G) that I had tried to sync
previously.
No. /dev/sdg1 is a *partition* on your old drive.

I'm concerned that running the first ddrescue may have stressed /dev/sda 
and
you'd lose data running it again with the correct arguments.

How do I transfer the label from /dev/sda (no partitions) to /dev/sdg1?
Can anyone suggest anything.

Cry i only have this idea:
dd_rescue -v -m 128k -r /dev/source -S 128k superblock.bin
losetup /dev/loop0 superblock.bin
mdadm --build -l linear --raid-devices=2 /dev/md1 /dev/sdg1 /dev/loop0

And the working raid member is /dev/md1. ;-)
But only for recovery!!!

(only idea, not tested.)

Cheers,
Janos

Cry don't do this...

I wonder about
dd if=/dev/sdg1 of=/dev/sdg
but goodness knows if it would work... it'd rely on dd reading from the 
start of
the partition device and writes to the disk device not overlapping - which 
they
shouldn't but...

David
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html