Re: RFC: mdadm and bringing up raid sets from initrd (dracut)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jul 14, 2009, at 6:59 AM, Hans de Goede wrote:
Hi,

As you probably know I'm working on making Fedora 12 use mdraid
instead of dmraid for Intel BIOS-RAID setups.

The installer (anaconda) part is mostly done (needs more testing)
and now I'm looking at implementing support for this in dracut
(the new mkinitrd for Fedora 12).

So I've been testing how this works for both imsm mdraid sets
and native mdraid metadata sets, in both cases using a 2 disk
mirror, so that the set can also be brought up in degraded mode.

Currently the udev rules use incremental assembly like this:
mdadm -I /dev/mdraid-member

Hmmm...does dracut use udev during initramfs time? mkinitrd didn't, so this would be a change. In particular, I didn't have these problems with mkinitrd because I didn't use udev rules in the initrd, I ran mdadm -A instead. In fact, the F11 method of bringup of raid devices is as such:

initrd: use mdadm -As --run <mddevice name with matching ARRAY entry in /etc/mdadm.conf> rc.sysinit: use mdadm -As --run (no md device name, which means all arrays listed in mdadm.conf will get brought up, plus extra arrays not listed in mdadm.conf but which can be found and identified by metadata) udev: in 65-md-incremental.rules use mdadm -I <block device> (but only if /dev/.in.rcsysinit does not exist, so we don't run udev incremental rules until after the system is up and running, which means for hot plugged devices...in particular we will never run the udev rule on any device that was present on boot, instead the previous two calls will catch these devices, and those previous calls will run degraded arrays, this allows me to safely refuse to run degraded arrays in the udev rules file without risking failing to boot, instead a degraded hot plugged array will need minor manual intervention, but the system will be fully up and operational no matter what)

I find this setup to be a rather safe, conservative way of handling md raid array hot plug. Are we going to be totally changing this with dracut and F12? This method very nicely resolves the issues you posted.

There are 2 problems with this:
1) When doing this for native mdraid metadata arrays, if only
  one disk is present the set never gets activated
2) When doing this for imsm metadata arrays, as soon as the
  first disk is incrementally added, the set gets activated
  in degraded mode and stays that way, the second disk
  will get added to the container, but not to the actual
  sets in the container

And these 2 problems have 2 different solutions:
1) An incomplete, but potentially activatable in degraded mode
  set can be activated using mdadm --run /dev/md#
2) One can stop this problem by using:
  mdadm -I --no-degraded /dev/mdraid-member
  instead (this does not change anything for
  native mdraid metadata format sets)
  But if that is done, the sets in the container never get
  activated, this can be fixed by running
  mdadm -I /dev/md# on the container device

So my proposed solution for this is when udev is done scanning
(when the event queue is empty, detected using the same mechanism as
dracut is using for dmraid), do the following:

For each /dev/md#
 run mdadm --export --detail, and get the MD_LEVEL
 if MD_LEVEL == "container":
   mdadm -I /dev/md#
 else
   mdadm --run /dev/md#

This will:
1) Bring up raid sets inside containers (such as imsm raidsets)
2) Bring up incomplete raid sets in degraded mode where possible

I'll post a patch implementing this later today.

Regards,

Hans


--

Doug Ledford <dledford@xxxxxxxxxx>

GPG KeyID: CFBFF194
http://people.redhat.com/dledford

InfiniBand Specific RPMS
http://people.redhat.com/dledford/Infiniband




Attachment: PGP.sig
Description: This is a digitally signed message part


[Index of Archives]     [Linux Kernel]     [Linux DVB]     [Asterisk Internet PBX]     [DCCP]     [Netdev]     [X.org]     [Util Linux NG]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux