Re: recovering from a controller failure

CoolCold <coolthecold@xxxxxxxxx> · Mon, 31 May 2010 12:33:24 +0400



On Mon, May 31, 2010 at 2:38 AM, Leslie Rhorer <lrhorer@xxxxxxxxxxx> wrote:
>
>
>> -----Original Message-----
>> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
>> owner@xxxxxxxxxxxxxxx] On Behalf Of CoolCold
>> Sent: Sunday, May 30, 2010 8:18 AM
>> To: Leslie Rhorer
>> Cc: Kyler Laird; linux-raid@xxxxxxxxxxxxxxx
>> Subject: Re: recovering from a controller failure
>>
>> On Sun, May 30, 2010 at 7:33 AM, Leslie Rhorer <lrhorer@xxxxxxxxxxx>
>> wrote:
>> >> On Sun, May 30, 2010 at 09:50:26AM +1200, Richard wrote:
>> >>
>> >> > How about adding entries to your mdadm.conf file containing the UUID
>> >> > of /dev/md0, eg:
>> >> >
>> >> > ARRAY /dev/md8 level=raid6 num-devices=16
>> >> > UUID=38a06a50:ce3fc204:728edfb7:4f4cdd43
>> >> >
>> >> > note this should be all one line.
>> >>
>> >> I'll be happy to do that.
>> >>
>> >> > mdadm -D /dev/md0 should get you the UUID.
>> >>
>> >>       root@00144ff2a334:/# mdadm -D /dev/md0
>> >>       mdadm: md device /dev/md0 does not appear to be active.
>> >>
>> >> So...how do I get the UUIDs?  I tried blkid and got this.
>> >>       http://lairds.us/temp/ucmeng_md/uuids
>> >> Those UUIDs are far from unique.
>> >
>> >        After all your drives are visible, of course:
>> >
>> > `mdadm --examine /dev/sd* /dev/hd* > <filename>`
>> > `more <filename>`
>> >
>> > Make note of the array UUID for each drive.  When done,
>> >
>> > `mdadm --assemble --assume-clean /dev/mdX /dev/<drive0> /dev/<drive1>
>> > /dev/<drive2> ...etc`
>> >
>> > where <drive0>, <drive1>, etc are all members of the same array UUID.
>> >
>> >        Mount the file system, and fsck it.  Once everything is verified
>> > good,
>> >
>> > `echo repair > /sys/block/mdX/md/sync_action`
>> Taking in account "Events" fields are differing on disks from 1st &
>> 2nd controller, interesting question for me - what will happen on this
>> "repair" ?
>
>        Note that should be --force, not --assume-clean.  The --assume-clean
> switch would be used if you re-created the array, not just re-assembled it.
> Once the array is assembled, the repair function will re-establish the
> redundancy within the array.  Any stripes whose data does not match the
> calculated value required to produce the upper layer information are
> re-written.
That's it - as you can see there are 15 drives in raid6 array. Examine
on disks from sda to sdh shows drives active and event count is 0.159,
sdi to sdp events count is 0.168 and show that sd[a-i] are faulty. So
I'm guessing there is no way to know which part of array is "right"
and i guess they are desynced.

>
>> And what this "Events" field really means? I didn't found description
>> in man pages.
>
>        I believe a number of things.  For one thing, it is used to keep
> track of which version of data resides in each drive, whenever an array
> event is encountered.  The value of the events counter in the members of an
> array should not be different by more than 1, or mdadm kicks the drive out
> of the array.
I've thought similar, but interesting - in this situation drives has
event count value like "0.168" and "0.159"...
> I expect it may also be used during forced re-assembly and /
> or during a resync of a RAID1 system to help determine which version of a
> stripe is correct.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Best regards,
[COOLCOLD-RIPN]
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html