On Fri, 2024-01-12 at 19:51 -0500, Benjamin Marzinski wrote: > Booting a machine with a multipathed kpartx root device fails for me > using the fedora rawhide multipath packages, which are based on the > 0.9.7 release. Using LVM on top works. The issue is that when the > root > device is directly on a partition, dracut finds it on one of the path > devices, and starts using that. If multipathd isn't running when the > uevent for that path device is processed, it won't be claimed by > multipath (starting in 0.9.7), since there is no multipathd.socket in > the initramfs and no systemd_service_enabled(). Afterwards, > multipathd > creates a multipath device on top of the device, claims it, and > removes > the partitions. If this happens while dracut is attempting to mount > the > root device, the boot fails. In practice, it usually failed for me. > > Reverting 6fad1464 ("libmpathutil: remove systemd_service_enabled()") > resolves the problem. When I tried to add > > Before=dracut-pre-mount.service > > to dracut's version of multipathd.service instead, it works over 95% > of > the time, but it still occasionally fails. The issue is that even > though > multipathd will creates the multipath device before before signaling > that it has started up, meaning that dracut won't start working > towards > mounting the root device until after the multipath device exists, > dracut > won't know to not use the underlying device partition until it > processes > the uevents that get triggered by multipathd creating the device. And > it > won't be able to use the kpartx device until in processes the uevents > that get triggered by kpartx running when processing the multipath > device uevents. Depending on how quickly dracut processes these > events > relative to the rest of the bootup work, it can still hang. I've > tested > adding > > Before=systemd-udev-trigger.service > > to multipathd.sevice with no failures so far. This requires fixing > multipathd-configure.service, so that there aren't any dependency > conflicts, but that should happen anyway. I need to talk to the > CoreOS > people who added this, but I think the only necessary dependency for > multipathd-configure.service to come after is Disclaimer: I have no experience with multipathd-configure.service. > > After=dracut-cmdline.service > > With this, I think that multipathd should always be running before > device uevents get processed, but perhaps it needs to be before > systemd-udevd.service instead. Yes indeed. I thought this was already the case with the "recent" changes made to dracut's multipath module: 297525c fix(multipath): remove dependency on multipathd.socket 6246da4 fix(multipathd.service): drop dependencies on iscsi and iscsid a247d2b fix(multipathd.service): adapt to upstream multipath-tools unit file 371b338 fix(multipathd.service): remove dependency on systemd-udev-settle Basically multipathd.service should have (almost) no "After=" dependencies, making it start very early during boot, and definitely before systemd-udev-trigger.service. Actually, it should start up after systemd-udev-socket.service, but before systemd-udevd.service. This way we'd ensure that we don't miss any uevents. I don't quite understand why this wasn't the case for you. Was it caused by multipathd.configure.service and its dependencies? (Note also my pending dracut PR https://github.com/dracutdevs/dracut/pull/2563#issuecomment-1823525208 where I'm trying to get rid of the dracut-specific multipathd.service file). > If it's not possible to guarantee that multipathd has started before > we > process uevents so that we always claim the path devices as soon as > they > appear, then to close this race window, we need to either wait after > multipathd starts for all the uevents to settle (and I don't think we > want to get back into the business of relying on udev-settle), or to > go > back to some method of making multipath able to claim devices before > multipathd starts. I don't think this will be necessary. We just need to get the dependencies right. Your example shows, though, that it might be sufficient to just add another service (here I suspect multipathd- configure.service) to mess up the deps. We can consider adding an explicit Before=systemd-udevd.service to our unit file. This way it'd be guaranteed that we start up before udevd, and if some other unit got it wrong, systemd should report a dependency cycle. Regards Martin