Re: Anything left to try?

Barrett Lewis <barrett.lewis.mitsi@xxxxxxxxx> · Thu, 2 May 2013 09:26:55 -0500

That is a really good idea.  At minimum I will ddrescue the failed drives.
But even if I do that, is there a way to reassemble the array (even with
new drives) now that those components are marked failed?

On Thu, May 2, 2013 at 1:01 AM, Sam Bingner <sam@xxxxxxxxxxx> wrote:
> I would first ddrescue to all new drives, then work from there... if you expect you will want to be able to undo, you could even do as I did once and ddrescue into lvm logical volumes then work with the snapshots to get the data while retaining a capability to undo anything you do.
>
> You should be able to then mount it read-only and get at least the reshaped data and probably most of the other data.
>
> In any case, whatever you do - don't try to USE that failing drive for anything other than data recovery from that drive to a new drive... in my experience drives only go downhill once they get to where that one is
>
> I expect if you ddrescue the failed drive you will get 99.9% of your data (just missing whatever that bad sector or two you noticed, possibly not even all that much)
>
> Also, don't WRITE to that array - avoid anything that could make the failed drive more out of sync
>
> Sam
>
>> -----Original Message-----
>> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
>> owner@xxxxxxxxxxxxxxx] On Behalf Of Barrett Lewis
>> Sent: Wednesday, May 01, 2013 4:53 AM
>> To: linux-raid@xxxxxxxxxxxxxxx
>> Subject: Anything left to try?
>>
>> For about two months on and off I've been struggling to outrun cascade
>> failures with my hardware and to recover as much of my data as
>> possible.  I think I am finally at a point where it might be time to
>> stick a fork in it, but I wanted to ask if there was any magic left I
>> could try at this point.
>>
>>
>> <backstory>
>> I was starting to hear mechanical problems in my drives and decided to
>> reshape my array from RAID5 to RAID6.  In the middle, the server
>> crashed, and for irrelevant reasons the backup file was lost.  I was
>> able to re-assemble the array with the --invalid-backup flag, which
>> was an amazing miracle after a month of thinking all was lost.  I did
>> not trust my hardware to survive the rest of the reshape before having
>> another failure so started rsyncing all the files off.
>> At this point I had what was originally a synced raid5, which was
>> growing/reshaping to a raid6 onto a new drive. Eventually /dev/sdd
>> totally failed and got marked F as per /proc/mdstat. The array was
>> still operational, still showed 4 of 6 components active, where the
>> fifth still being resynced.   The rsync was going while it was
>> reshaping, since I didn't have faith the array would live long enough
>> for the 5 day reshape to complete, which would have gotten me up to
>> 5/6 components of the raid 6, at which point I would have added
>> another drive for 6/6. The reason I didn't immediatly add the extra
>> drive is because I really felt this array and its drives were a
>> timebomb and I was more immediatly interested in copying the data off
>> than trying to get the full raid 6 running (if reshape was going to
>> take 5 days and I expected another drive to fail within a few).
>> Anyway, quite a bit of my data got rsynced off, this morning I woke up
>> to find another drive (/dev/sda) in the array had failed as I
>> anticipated.
>> </backstory>
>>
>>
>>
>> I am really confused about what kind of state it is in:
>>
>> cat /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md0 : active raid6 sdb[0] sdc[7] sda[5](F) sde[4] sdd[2](F) sdf[6]
>>       7813531648 blocks super 1.2 level 6, 512k chunk, algorithm 18
>> [6/3] [UU_U__]
>>
>> unused devices: <none>
>>
>> So judging from the above, I would expect the array to be totally
>> offline/broken.  3 out of 6 drives in a raid6?  But my array is still
>> "active" and I still have it mounted, can still see the folder
>> structure and many files, just the certain folders give I/O errors
>> when you try to ls them.  How is this possible?  Why do I have some
>> files but not the others?  Is it wrong to expect all or nothing of a
>> filesystem on md0 (ext4)?
>>
>> My theory: The array had been 40% through the reshape onto the new
>> drive (even with 4/5 of the 'old' components), then 1 more of the old
>> components died, so I am at 3/5 of the old components PLUS a new drive
>> with 40% of the stripes, so right now in essence I have 40% of the 4th
>> drive needed for minimum raid6 operation, and thus can see 40% of the
>> files? I don't know if mdadm is capable of that kind of magic, but I
>> can't otherwise explain how my array is even assembled right now. Can
>> anyone tell me if this is possible/accurate? If it is something like
>> this, why doen't mdstat say something other than "active" like
>> "degraded" etc?
>>
>>
>>
>> The important question:   I managed to get ~80% of my data rsynced off
>> before sda failed.  /dev/sda still seems partially functional, Disk
>> Utility says /dev/sda is "green" but with 1 bad sector, it has
>> readable status from --examine, but occasionally it makes awful
>> mechanical noises.  I'm wondering if mdadm hit some error threshold
>> and marked it as failed early to try to protect the rest of the array.
>>  Since the array is otherwise hosed, I would be willing to do anything
>> (potentially destructive) to *attempt* to bring /dev/sda back into the
>> array, even for only a day or so (to rsync as much of the remaining
>> 20% as possible), before I have to dismantle the array.  Is there any
>> way I can say "mdadm, do your best and try to work with /dev/sda if
>> you can"?  Or is it time to move on?
>>
>> sudo mdadm --examine /dev/sd[a-f]
>> /dev/sda:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x4
>>      Array UUID : 8bc78af0:d9a981e3:73549f21:
>> 2f76cd24
>>            Name : mainframe:vault  (local to host mainframe)
>>   Creation Time : Wed Aug 15 21:57:14 2012
>>      Raid Level : raid6
>>    Raid Devices : 6
>>
>>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>>     Data Offset : 262144 sectors
>>    Super Offset : 8 sectors
>>           State : active
>>     Device UUID : f18da9cc:27f5eee4:61ba900e:dd6ca8b9
>>
>>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>>      New Layout : left-symmetric
>>
>>     Update Time : Sun Apr 21 06:27:31 2013
>>        Checksum : 75147b14 - correct
>>          Events : 755496
>>
>>          Layout : left-symmetric-6
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 4
>>    Array State : AA.AAA ('A' == active, '.' == missing)
>> /dev/sdb:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x4
>>      Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
>>            Name : mainframe:vault  (local to host mainframe)
>>   Creation Time : Wed Aug 15 21:57:14 2012
>>      Raid Level : raid6
>>    Raid Devices : 6
>>
>>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>>     Data Offset : 262144 sectors
>>    Super Offset : 8 sectors
>>           State : clean
>>     Device UUID : 004a89c7:bd03e0fe:b6ea3ab9:76e5e5e0
>>
>>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>>      New Layout : left-symmetric
>>
>>     Update Time : Sun Apr 21 13:41:12 2013
>>        Checksum : 5bc7638b - correct
>>          Events : 759402
>>
>>          Layout : left-symmetric-6
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 0
>>    Array State : AA.A.A ('A' == active, '.' == missing)
>> /dev/sdc:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x6
>>      Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
>>            Name : mainframe:vault  (local to host mainframe)
>>   Creation Time : Wed Aug 15 21:57:14 2012
>>      Raid Level : raid6
>>    Raid Devices : 6
>>
>>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>>     Data Offset : 262144 sectors
>>    Super Offset : 8 sectors
>> Recovery Offset : 1638760448 sectors
>>           State : clean
>>     Device UUID : 0d8ddf14:2601f343:0b7e182f:cc8358e9
>>
>>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>>      New Layout : left-symmetric
>>
>>     Update Time : Sun Apr 21 13:41:12 2013
>>        Checksum : ce2e55b3 - correct
>>          Events : 759402
>>
>>          Layout : left-symmetric-6
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 5
>>    Array State : AA.A.A ('A' == active, '.' == missing)
>> /dev/sde:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x4
>>      Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
>>            Name : mainframe:vault  (local to host mainframe)
>>   Creation Time : Wed Aug 15 21:57:14 2012
>>      Raid Level : raid6
>>    Raid Devices : 6
>>
>>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>>     Data Offset : 262144 sectors
>>    Super Offset : 8 sectors
>>           State : clean
>>     Device UUID : 1df1fd17:592f431a:f3f05592:fbfccdcd
>>
>>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>>      New Layout : left-symmetric
>>
>>     Update Time : Sun Apr 21 13:41:12 2013
>>        Checksum : 8da25408 - correct
>>          Events : 759402
>>
>>          Layout : left-symmetric-6
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 3
>>    Array State : AA.A.A ('A' == active, '.' == missing)
>> /dev/sdf:
>>           Magic : a92b4efc
>>         Version : 1.2
>>     Feature Map : 0x4
>>      Array UUID : 8bc78af0:d9a981e3:73549f21:2f76cd24
>>            Name : mainframe:vault  (local to host mainframe)
>>   Creation Time : Wed Aug 15 21:57:14 2012
>>      Raid Level : raid6
>>    Raid Devices : 6
>>
>>  Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
>>      Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
>>   Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
>>     Data Offset : 262144 sectors
>>    Super Offset : 8 sectors
>>           State : clean
>>     Device UUID : 15dcad1e:3808a229:7409b3aa:4e03ae1b
>>
>>   Reshape pos'n : 3277520896 (3125.69 GiB 3356.18 GB)
>>      New Layout : left-symmetric
>>
>>     Update Time : Sun Apr 21 13:41:12 2013
>>        Checksum : 9ee36b5 - correct
>>          Events : 759402
>>
>>          Layout : left-symmetric-6
>>      Chunk Size : 512K
>>
>>    Device Role : Active device 1
>>    Array State : AA.A.A ('A' == active, '.' == missing)
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid"
>> in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html