On Tue, 2009-07-14 at 10:14 -0400, Doug Ledford wrote: > On Jul 14, 2009, at 11:02 AM, Hans de Goede wrote: > > Hi, > > On 07/14/2009 03:39 PM, Doug Ledford wrote: > >> On Jul 14, 2009, at 6:59 AM, Hans de Goede wrote: > >>> Hi, > >>> > >>> As you probably know I'm working on making Fedora 12 use mdraid > >>> instead of dmraid for Intel BIOS-RAID setups. > >>> > >>> The installer (anaconda) part is mostly done (needs more testing) > >>> and now I'm looking at implementing support for this in dracut > >>> (the new mkinitrd for Fedora 12). > >>> > >>> So I've been testing how this works for both imsm mdraid sets > >>> and native mdraid metadata sets, in both cases using a 2 disk > >>> mirror, so that the set can also be brought up in degraded mode. > >>> > >>> Currently the udev rules use incremental assembly like this: > >>> mdadm -I /dev/mdraid-member > >> > >> Hmmm...does dracut use udev during initramfs time? > > > > Yes, it uses udev for everything, making discovery of / consistent > > with the discovery of other storage devices. > > I'm not sure I like or agree with that philosophy. I absolutely > *don't* want my / filesystem or raid device treated like some plug in, > temporary, roaming raid device. They *aren't* the same, not in terms > of importance to the running of the machine and not in terms of > reliability requirements. By using mdadm -A in the mkinitrd calls, I > was able to put in an mdadm.conf file and limit what arrays get > started to arrays found non-ambiguously in that mdadm.conf file and > identified by UUID. When you switch to incremental assembly for root, > you risk the possibility of name space collisions and non- > deterministic bring up of your / array. I'm concerned about this too. To be more specific, I'm concerned about both automatically assembling things like RAID arrays / LVM logical volumes and also automounting devices [1]. Anyway, my point with all this is that maybe we are going about things wrong in the initramfs. My understanding is that dracut roughly works this way (please let me know if this is wrong) 1. when generating the initramfs image, we leave information in the kernel command-line about the root filesystem - typically the UUID - e.g. root=UUID=786263c4-5e28-4cdc-97b8-1ab6e221c344 2. when the initramfs starts, we trigger all uevents and wait for things to settle 3. Autoassembly / magic: - If we see e.g. md components, we activate them via udev rules - If we see e.g. LUKS devices, we unlock them (by interacting with the user asking for the passphrase) via udev rules. - Ditto for e.g. LVM 5. if we see the rootfs (matching on e.g. the UUID passed on the kernel command line) we create the /dev/root symlink 6. when the system has settled (e.g. no more uevents) we mount /dev/root and transition to non-early user space. If there is no /dev/root link, we bail out Now, my beef is 3. above. I think it is way too optimistic to just auto-assemble / unlock etc. everything. E.g. we end up doing a lot of work not related to the rootfs that is better done in non-early user space. Instead, just like we specify the UUID for rootfs on the command-line, we need to leave some instructions to the initramfs logic on _exactly_ what things should be autoassembled / unlocked / etc. in order to find the rootfs. So the kernel command-line wouldn't really be "just" the UUID of rootfs; it would be a whole recipe of actions to do. E.g. ROOTFS=UUID=1234 \ # this the UUID of my rootfs MD_ASSEMBLE=UUID=4567 \ # assemble MD array with UUID 4567 LUKS_UNLOCK=UUID=89ab # unlock LUKS device with UUID 89ab which would work for e.g. cases where rootfs is on a LUKS device which is on a MD array. In other words, we'd need a whole "recipe" passed to the initramfs (the mkinitrd tool would generate this recipe), not just the UUID of the rootfs. Coincidentally, if we had something like this and the format of the "recipe" was documented somewhere, it would be easy to e.g. implement "rescue" functionality as described here http://www.redhat.com/archives/fedora-desktop-list/2009-July/msg00019.html since graphical disk utilities would just find /etc/grub.conf (or similar), read the recipe and then start assembling/unlocking bits and mount them as appropriate in /mnt/rescue/. Actually this is very close to what Doug is asking for when he says (paraphrased) "just include mdadm.conf instead of this magic". The key difference, however, is that the user _won't_ have to use mdadm.conf or care about config files - it's all taken care of by the mkinitrd binary when building the recipe. This is a good thing as having one less config file to worry about is good. Thanks for considering, and sorry for the long mail, David [1] : As some background information, I've spent a good chunk of my life, five years or so, dealing with end users complaining about how plain block devices got automounted when they were plugged in. FWIW, the complaints ranges from both non-sensical (irritated users: "these desktop kids shall not decide how UNIX works") to actual bugs where the on-disk contents were mis-detected and either something wrong got automounted or we failed to automount at all. If I've learned anything it's that you need to be very very careful here - unlike Windows and other operating systems with such capabilities, Linux is.. different.. mostly because we support so many different ways to put a file system through things likd md and dm. And you need to make it very easy to turn things like this off. -- To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html