Re: 3-disk fail on raid-6, examining my options...

Wakko Warner <wakko@xxxxxxxxxxxx> · Wed, 19 Jul 2017 13:09:14 -0400

Wols Lists wrote:
> On 18/07/17 21:25, Wakko Warner wrote:
> > Wols Lists wrote:
> >> On 18/07/17 18:20, Maarten wrote:
> >>> Now from what I've gathered over the years and from earlier incidents, I
> >>> have now 1 (one) chance left to rescue data off this array; by hopefully
> >>> cloning the bad 3rd-failed drive with the aid of dd_rescue and
> >>> re-assembling --force the fully-degraded array. (Only IF that drive is
> >>> still responsive and can be cloned)
> >>
> >> If it clones successfully, great. If it clones, but with badblocks, I
> >> keep on asking - is there any way we can work together to turn
> >> dd-rescue's log into a utility that will flag failed blocks as "unreadable"?
> > 
> > I wrote a shell script that will output a device mapper table to do this. 
> > It will do either zero or error targets for failed blocks.  It's not
> > automatic and does require a block device (loop for files).  I've used this
> > several times at work and works for me.
> > 
> > I'm not sure if this is what you're talking about or not, but if you want
> > the script, I'll post it.
> > 
> I'm not sure I understand what you're saying, but I'm certainly
> interested. It'll probably end up on the wiki if that's okay with you?

That's fine.

> I'll aim to understand and document it so others will be able hopefully
> to use it as a "fire and forget" tool (inasmuch as you can
> fire-and-forget any recovery task :-)
> 
> What I'm thinking of is a utility that uses "hdparm --make-bad-sector".
> The idea being that if you have multiple disk failures, you can at least
> clone everything worth having off the broken disks, and then you can run
> a "tar . > /dev/null" or do a sync or whatever, and know that if it
> reads successfully off the array it isn't corrupt. Unless you're unlucky
> enough to have multiple drives fail in the same stripe, you should then
> recover your array no problem.

That's pretty much how I use it in a way.

Here's a real ddrescue log from one that I did:
# Rescue Logfile. Created by GNU ddrescue version 1.16
# Command line: ddrescue -s 85900394496 /dev/sdg /path/to/image.img /path/to/image.log
# current_pos  current_status
0xA078F9C00     +
#      pos        size  status
0x00000000  0xA078F9000  +
0xA078F9000  0x00001000  -
0xA078FA000  0x9F8806000  +
0x1400100000  0x2638A2E000  ?

I use losetup to make /path/to/image.img a block device.

I run the script I wrote:
sh ddlog-to-dm.sh /dev/loop0 < /path/to/image.log

Which outputs the following:
0 84133832 linear /dev/loop0 0
84133832 8 error
84133840 83640368 linear /dev/loop0 84133840

Then I run:
dmsetup create sometarget

I paste in the output and I now have /dev/mapper/sometarget that has errors
at the location that was bad.  Since it uses device mapper, the error part
doesn't retry.  This will work with hard disks instead of images.

To work with a real disk, skip the losetup part and use /dev/sdX instead of
/dev/loop0.  In my case above, assume I closed sdg to sdh, I would do:
sh ddlog-to-dm.sh /dev/sdh < /path/to/image.log
dmsetup create sdh

Then use /dev/mapper/sdh.

If you're familiar with device mapper, there are no partitions, you have to
create another target.  I use kpartx -a for this and when I'm done, I use
kpartx -d to tear it down.

When you're done, dmsetup remove sometarget and remove the loop device.

I have attached the script.

-- 
 Microsoft has beaten Volkswagen's world record.  Volkswagen only created 22
 million bugs.
Attachment:
ddlog-to-dm.sh

Description: Bourne shell script