RE: recovering from a controller failure

"Leslie Rhorer" <lrhorer@xxxxxxxxxxx> · Sun, 30 May 2010 17:38:12 -0500



> -----Original Message-----
> From: linux-raid-owner@xxxxxxxxxxxxxxx [mailto:linux-raid-
> owner@xxxxxxxxxxxxxxx] On Behalf Of CoolCold
> Sent: Sunday, May 30, 2010 8:18 AM
> To: Leslie Rhorer
> Cc: Kyler Laird; linux-raid@xxxxxxxxxxxxxxx
> Subject: Re: recovering from a controller failure
> 
> On Sun, May 30, 2010 at 7:33 AM, Leslie Rhorer <lrhorer@xxxxxxxxxxx>
> wrote:
> >> On Sun, May 30, 2010 at 09:50:26AM +1200, Richard wrote:
> >>
> >> > How about adding entries to your mdadm.conf file containing the UUID
> >> > of /dev/md0, eg:
> >> >
> >> > ARRAY /dev/md8 level=raid6 num-devices=16
> >> > UUID=38a06a50:ce3fc204:728edfb7:4f4cdd43
> >> >
> >> > note this should be all one line.
> >>
> >> I'll be happy to do that.
> >>
> >> > mdadm -D /dev/md0 should get you the UUID.
> >>
> >>       root@00144ff2a334:/# mdadm -D /dev/md0
> >>       mdadm: md device /dev/md0 does not appear to be active.
> >>
> >> So...how do I get the UUIDs?  I tried blkid and got this.
> >>       http://lairds.us/temp/ucmeng_md/uuids
> >> Those UUIDs are far from unique.
> >
> >        After all your drives are visible, of course:
> >
> > `mdadm --examine /dev/sd* /dev/hd* > <filename>`
> > `more <filename>`
> >
> > Make note of the array UUID for each drive.  When done,
> >
> > `mdadm --assemble --assume-clean /dev/mdX /dev/<drive0> /dev/<drive1>
> > /dev/<drive2> ...etc`
> >
> > where <drive0>, <drive1>, etc are all members of the same array UUID.
> >
> >        Mount the file system, and fsck it.  Once everything is verified
> > good,
> >
> > `echo repair > /sys/block/mdX/md/sync_action`
> Taking in account "Events" fields are differing on disks from 1st &
> 2nd controller, interesting question for me - what will happen on this
> "repair" ?

	Note that should be --force, not --assume-clean.  The --assume-clean
switch would be used if you re-created the array, not just re-assembled it.
Once the array is assembled, the repair function will re-establish the
redundancy within the array.  Any stripes whose data does not match the
calculated value required to produce the upper layer information are
re-written.

> And what this "Events" field really means? I didn't found description
> in man pages.

	I believe a number of things.  For one thing, it is used to keep
track of which version of data resides in each drive, whenever an array
event is encountered.  The value of the events counter in the members of an
array should not be different by more than 1, or mdadm kicks the drive out
of the array.  I expect it may also be used during forced re-assembly and /
or during a resync of a RAID1 system to help determine which version of a
stripe is correct.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html