Re: fsck problems. Can't restore raid

Michael Evans <mjevans1983@xxxxxxxxx> · Mon, 28 Dec 2009 18:46:09 -0800



On Sun, Dec 27, 2009 at 2:47 PM, Leslie Rhorer <lrhorer@xxxxxxxxxxx> wrote:
>> On Sun, 2009-12-27 at 00:13 -0600, Leslie Rhorer wrote:
>> > > # mdadm --examine /dev/sdb1
>> > > mdadm: No md superblock detected on /dev/sdb1.
>> > >
>> > > (Does this mean that sdb1 is bad? or is that OK?)
>> >
>> >     It doesn't necessarily mean the drive is bad, but the superblock is
>> > gone.  Are you having mdadm monitor your array(s) and send informational
>> > messages to you upon RAID events?  If not, then what may have happened
>> is
>> > you lost the superblock on sdb1 and at some other time - before or after
>> -
>> > lost the sda drive.  Once both events had taken place, your array is
>> toast.
>> Right, I need to set up monitoring...
>
>        Um, yeah.  A RAID array won't prevent drives from going up in smoke,
> and if you don't know a drive has failed, you won't know you need to fix
> something - until a second drive fails.
>
>> >     All may not be lost, however.  First of all, take care when
>> > re-arranging not to lose track of which drive was which at the outset.
>> In
>> > fact, other than the sda drive, you might be best served not to move
>> > anything.  Take special care if the system re-assigns drive letters, as
>> it
>> > can easily do.
>> So should I just "move" the A drive? and try to fire it back up?
>
>        At this point, yeah.  Don't lose track of from where and to where it
> has been moved, though.
>
>> >     What are the contents of /etc/mdadm.conf?
>> >
>>
>> mdadm.conf contains this:
>> ARRAY /dev/md0 level=raid10 num-devices=4
>> UUID=3d93e545:c8d5baec:24e6b15c:676eb40f
>
>        Yeah, that doesn't help much.
>
>> So, by re-creating, do you mean I should try to run the "mdadm --create"
>> command again the same way I did back when I created the array
>> originally? Will that wipe out my data?
>
>        Not in and of itself, no.  If you get the drive order wrong
> (different than when it was first created) and resync or write to the array,
> then it will munge the data, but all creating the array does is create the
> superblocks.
>
>
>> # smartctl -l selftest /dev/sda
>> smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> Standard Inquiry (36 bytes) failed [No such device]
>> Retrying with a 64 byte Standard Inquiry
>> Standard Inquiry (64 bytes) failed [No such device]
>> A mandatory SMART command failed: exiting. To continue, add one or more '-
>> T permissive' options.
>
>        Well, we kind of knew that.  Either the drive is dead, or there is a
> hardware problem in the controller path.  Hope for the latter, although a
> drive with a frozen platter can sometimes be resurrected, and if the drive
> electronics are bad but the servo assemblies are OK, replacing the
> electronics is not difficult.  Otherwise, it's a goner.
>
>> # smartctl -l selftest /dev/sdb
>> smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART Self-test log structure revision number 1
>> Num  Test_Description    Status                  Remaining
>> LifeTime(hours)  LBA_of_first_error
>> # 1  Extended offline    Completed: read failure       90%      7963
>> 543357
>
>        Oooh!  That's bad.  Really bad.  Your earlier post showed the
> superblock is a 0.90 version.  The 0.90 superblock is stored near the end of
> the partition.  Your drive is suffering a heart attack when it gets near the
> end of the drive.  If you can't get your sda drive working again, then I'm
> afraid you've lost some data, maybe all of it.  Trying to rebuild a
> partition from scratch when part of it is corrupted is not for the feint of
> heart.  If you are lucky, you might be able to dd part of the sdb drive onto
> a healthy one and manually restore the superblock.  That, or since the sda
> drive does appear in /dev, you might have some luck copying some of it to a
> new drive.
>
>        Beyond that, you are either going to need the advice of someone who
> knows much more about md and Linux than I do, or else the services of a
> professional drive recovery expert.  They don't come cheap.
>
>> This is strange, now I am getting info from mdadm --examine that is
>> different than before...
>
>        It looks like sda may be responding for the time being.  I suggest
> you try to assemble the array, and if successful, copy whatever data you can
> to a backup device.  Do not mount the array as read-write until you have
> recovered everything you can.  If some data is orphaned, it might be in the
> lost+found directory.  If that's successful, I suggest you find out why you
> had two failures and start over.  I wouldn't use a 0.90 superblock, though,
> and you definitely want to have monitoring enabled.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

If you have the spare drives/space I -highly- recommend dd_rescue /
ddrescue copying the suspected-bad drives contents to clean drives.
http://www.linuxfoundation.org/collaborate/workgroups/linux-raid/raid_recovery
has a script to try out the combinations so you can see where the
least data is lost.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html