RE: Anything left to try?

Sam Bingner <sam@xxxxxxxxxxx> · Thu, 2 May 2013 06:01:40 +0000

I would first ddrescue to all new drives, then work from there... if you expect you will want to be able to undo, you could even do as I did once and ddrescue into lvm logical volumes then work with the snapshots to get the data while retaining a capability to undo anything you do.

You should be able to then mount it read-only and get at least the reshaped data and probably most of the other data.

In any case, whatever you do - don't try to USE that failing drive for anything other than data recovery from that drive to a new drive... in my experience drives only go downhill once they get to where that one is

I expect if you ddrescue the failed drive you will get 99.9% of your data (just missing whatever that bad sector or two you noticed, possibly not even all that much)

Also, don't WRITE to that array - avoid anything that could make the failed drive more out of sync

Sam

> -----Original Message-----
> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Barrett Lewis
> Sent: Wednesday, May 01, 2013 4:53 AM
> To: linux-raid@xxxxxxxxxxxxxxx
> Subject: Anything left to try?
> 
> For about two months on and off I've been struggling to outrun cascade
> failures with my hardware and to recover as much of my data as
> possible.  I think I am finally at a point where it might be time to
> stick a fork in it, but I wanted to ask if there was any magic left I
> could try at this point.
> 
> 
> <backstory>
> I was starting to hear mechanical problems in my drives and decided to
> reshape my array from RAID5 to RAID6.  In the middle, the server
> crashed, and for irrelevant reasons the backup file was lost.  I was
> able to re-assemble the array with the --invalid-backup flag, which
> was an amazing miracle after a month of thinking all was lost.  I did
> not trust my hardware to survive the rest of the reshape before having
> another failure so started rsyncing all the files off.
> At this point I had what was originally a synced raid5, which was
> growing/reshaping to a raid6 onto a new drive. Eventually /dev/sdd
> totally failed and got marked F as per /proc/mdstat. The array was
> still operational, still showed 4 of 6 components active, where the
> fifth still being resynced.   The rsync was going while it was
> reshaping, since I didn't have faith the array would live long enough
> for the 5 day reshape to complete, which would have gotten me up to
> 5/6 components of the raid 6, at which point I would have added
> another drive for 6/6. The reason I didn't immediatly add the extra
> drive is because I really felt this array and its drives were a
> timebomb and I was more immediatly interested in copying the data off
> than trying to get the full raid 6 running (if reshape was going to
> take 5 days and I expected another drive to fail within a few).
> Anyway, quite a bit of my data got rsynced off, this morning I woke up
> to find another drive (/dev/sda) in the array had failed as I
> anticipated.
> </backstory>
> 
> 
> 
> I am really confused about what kind of state it is in:
> 
> cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md0 : active raid6 sdb[0] sdc[7] sda[5](F) sde[4] sdd[2](F) sdf[6]
>       7813531648 blocks super 1.2 level 6, 512k chunk, algorithm 18
> [6/3] [UU_U__]
> 
> unused devices: <none>
> 
> So judging from the above, I would expect the array to be totally
> offline/broken.  3 out of 6 drives in a raid6?  But my array is still
> "active" and I still have it mounted, can still see the folder
> structure and many files, just the certain folders give I/O errors
> when you try to ls them.  How is this possible?  Why do I have some
> files but not the others?  Is it wrong to expect all or nothing of a
> filesystem on md0 (ext4)?
> 
> My theory: The array had been 40% through the reshape onto the new
> drive (even with 4/5 of the 'old' components), then 1 more of the old
> components died, so I am at 3/5 of the old components PLUS a new drive
> with 40% of the stripes, so right now in essence I have 40% of the 4th
> drive needed for minimum raid6 operation, and thus can see 40% of the
> files? I don't know if mdadm is capable of that kind of magic, but I
> can't otherwise explain how my array is even assembled right now. Can
> anyone tell me if this is possible/accurate? If it is something like
> this, why doen't mdstat say something other than "active" like
> "degraded" etc?
> 
> 
> 
> The important question:   I managed to get ~80% of my data rsynced off
> before sda failed.  /dev/sda still seems partially functional, Disk
> Utility says /dev/sda is "green" but with 1 bad sector, it has
> readable status from --examine, but occasionally it makes awful
> mechanical noises.  I'm wondering if mdadm hit some error threshold
> and marked it as failed early to try to protect the rest of the array.
>  Since the array is otherwise hosed, I would be willing to do anything
> (potentially destructive) to *attempt* to bring /dev/sda back into the
> array, even for only a day or so (to rsync as much of the remaining
> 20% as possible), before I have to dismantle the array.  Is there any
> way I can say "mdadm, do your best and try to work with /dev/sda if
> you can"?  Or is it time to move on?
> 
> sudo mdadm --examine /dev/sd[a-f]
> /dev/sda:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : 8bc78af0:d9a981e3:73549f21:
> 2f76cd24
>            Name : mainframe:vault  (local to host mainframe)
>   Creation Time : Wed Aug 15 21:57:14 2012
>      Raid Level : raid6
>    Raid Devices : 6
> 
>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : f18da9cc:27f5eee4:61ba900e:dd6ca8b9
> 
>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>      New Layout : left-symmetric
> 
>     Update Time : Sun Apr 21 06:27:31 2013
>        Checksum : 75147b14 - correct
>          Events : 755496
> 
>          Layout : left-symmetric-6
>      Chunk Size : 512K
> 
>    Device Role : Active device 4
>    Array State : AA.AAA ('A' == active, '.' == missing)
> /dev/sdb:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
>            Name : mainframe:vault  (local to host mainframe)
>   Creation Time : Wed Aug 15 21:57:14 2012
>      Raid Level : raid6
>    Raid Devices : 6
> 
>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 004a89c7:bd03e0fe:b6ea3ab9:76e5e5e0
> 
>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>      New Layout : left-symmetric
> 
>     Update Time : Sun Apr 21 13:41:12 2013
>        Checksum : 5bc7638b - correct
>          Events : 759402
> 
>          Layout : left-symmetric-6
>      Chunk Size : 512K
> 
>    Device Role : Active device 0
>    Array State : AA.A.A ('A' == active, '.' == missing)
> /dev/sdc:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x6
>      Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
>            Name : mainframe:vault  (local to host mainframe)
>   Creation Time : Wed Aug 15 21:57:14 2012
>      Raid Level : raid6
>    Raid Devices : 6
> 
>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
> Recovery Offset : 1638760448 sectors
>           State : clean
>     Device UUID : 0d8ddf14:2601f343:0b7e182f:cc8358e9
> 
>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>      New Layout : left-symmetric
> 
>     Update Time : Sun Apr 21 13:41:12 2013
>        Checksum : ce2e55b3 - correct
>          Events : 759402
> 
>          Layout : left-symmetric-6
>      Chunk Size : 512K
> 
>    Device Role : Active device 5
>    Array State : AA.A.A ('A' == active, '.' == missing)
> /dev/sde:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
>            Name : mainframe:vault  (local to host mainframe)
>   Creation Time : Wed Aug 15 21:57:14 2012
>      Raid Level : raid6
>    Raid Devices : 6
> 
>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 1df1fd17:592f431a:f3f05592:fbfccdcd
> 
>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>      New Layout : left-symmetric
> 
>     Update Time : Sun Apr 21 13:41:12 2013
>        Checksum : 8da25408 - correct
>          Events : 759402
> 
>          Layout : left-symmetric-6
>      Chunk Size : 512K
> 
>    Device Role : Active device 3
>    Array State : AA.A.A ('A' == active, '.' == missing)
> /dev/sdf:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x4
>      Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
>            Name : mainframe:vault  (local to host mainframe)
>   Creation Time : Wed Aug 15 21:57:14 2012
>      Raid Level : raid6
>    Raid Devices : 6
> 
>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>     Data Offset : 262144 sectors
>    Super Offset : 8 sectors
>           State : clean
>     Device UUID : 15dcad1e:3808a229:7409b3aa:4e03ae1b
> 
>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>      New Layout : left-symmetric
> 
>     Update Time : Sun Apr 21 13:41:12 2013
>        Checksum : 9ee36b5 - correct
>          Events : 759402
> 
>          Layout : left-symmetric-6
>      Chunk Size : 512K
> 
>    Device Role : Active device 1
>    Array State : AA.A.A ('A' == active, '.' == missing)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html