Re: Raid-10 mount at startup always has problem

Doug Ledford <dledford@xxxxxxxxxx> · Mon, 29 Oct 2007 11:47:19 -0400

On Mon, 2007-10-29 at 09:18 +0100, Luca Berra wrote:
> On Sun, Oct 28, 2007 at 10:59:01PM -0700, Daniel L. Miller wrote:
> >Doug Ledford wrote:
> >>Anyway, I happen to *like* the idea of using full disk devices, but the
> >>reality is that the md subsystem doesn't have exclusive ownership of the
> >>disks at all times, and without that it really needs to stake a claim on
> >>the space instead of leaving things to chance IMO.
> >>   
> >I've been re-reading this post numerous times - trying to ignore the 
> >burgeoning flame war :) - and this last sentence finally clicked with me.
> >
> I am sorry Daniel, when i read Doug and Bill, stating that your issue
> was not having a partition table, i immediately took the bait and forgot
> about your original issue.

I never said *his* issue was lack of partition table, I just said I
don't recommend that because it's flaky.  The last statement I made
about his issue was to ask about whether the problem was happening
during initrd time or sysinit time to try and identify if it was failing
before or after / was mounted to try and determine where the issue might
lay.  Then we got off on the tangent about partitions, and at the same
time Neil started asking about udev, at which point it came out that
he's running ubuntu, and as much as I would like to help, the fact of
the matter is that I've never touched ubuntu and wouldn't have the
faintest clue, so I let Neil handle it.  At which point he found that
the udev scripts in ubuntu are being stupid, and from the looks of it
are the cause of the problem.  So, I've considered the initial issue
root caused for a bit now.

> like udev/hal that believes it knows better than you about what you have
> on your disks.
> but _NEITHER OF THESE IS YOUR PROBLEM_ imho

Actually, it looks like udev *is* the problem, but not because of
partition tables.

> I am also sorry to say that i fail to identify what the source of your
> problem is, we should try harder instead of flaming between us.

We can do both, or at least I can :-P

> Is it possible to reproduce it on the live system
> e.g. unmount, stop array, start it again and mount.
> I bet it will work flawlessly in this case.
> then i would disable starting this array at boot, and start it manually
> when the system is up (stracing mdadm, so we can see what it does)
> 
> I am also wondering about this:
> md: md0: raid array is not clean -- starting background reconstruction
> does your system shut down properly?
> do you see the message about stopping md at the very end of the
> reboot/halt process?

The root cause is that as udev adds his sata devices one at a time, on
each add of the sata device it invokes mdadm to see if there is an array
to start, and it doesn't use incremental mode on mdadm.  As a result, as
soon as there are 3 out of the 4 disks present, mdadm starts the array
in degraded mode.  It's probably a race between the mdadm started on the
third disk and mdadm started on the fourth disk that results in the
message about being unable to set the array info.  The one loosing the
race gets the error as the other one has already manipulated the array
(for example, the 4th disk mdadm could be trying to add the first disk
to the array, but it's already there, so it gets this error and bails).

So, as much as you might dislike mkinitrd since 5.0 Luca, it doesn't
have this particular problem ;-)  In the initrd we produce, it loads all
the SCSI/SATA/etc drivers first, then calls mkblkdevs which forces all
of the devices to appear in /dev, and only then does it start the
mdadm/lvm configuration.  Daniel, I make no promises what so ever that
this will even work at all as it may fail to load modules or all other
sorts of weirdness, but if you want to test the theory, you can download
the latest mkinitrd from fedoraproject.org, then use it to create an
initrd image under some other name than your default image name, then
manually edit your boot to have an extra stanza that uses the mkinitrd
generated initrd image instead of the ubuntu image, and then just see if
it brings the md device up cleanly instead of in degraded mode.  That
should be a fairly quick and easy way to test if Neil's analysis of the
udev script was right.

-- 
Doug Ledford <dledford@xxxxxxxxxx>
              GPG KeyID: CFBFF194
              http://people.redhat.com/dledford

Infiniband specific RPMs available at
              http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc

Description: This is a digitally signed message part