On Thu, 2020-12-17 at 23:48 -0600, Benjamin Marzinski wrote: > On Thu, Dec 17, 2020 at 12:00:15PM +0100, mwilck@xxxxxxxx wrote: > > From: Martin Wilck <mwilck@xxxxxxxx> > > > > We've recently observed various cases of incompletely processed > > uevents > > during initrd processing. Typically, this would leave a dm device > > in > > the state it had after the initial "add" uevent, which is basically > > unusable, > > because udevd had been killed by systemd before processing the > > subsequent > > "change" event. After switching root, the coldplug event would re- > > read > > the db file, which would be in unusable state, and would not do > > anything. > > In such cases, a RELOAD action with force_udev_reload=1 is in order > > to > > make udev re-process the device completely > > (DM_UDEV_PRIMARY_SOURCE_FLAG=1 and > > DM_SUBSYSTEM_UDEV_FLAG0=0). > > > > The previous commits > > > > 2b25a9e libmultipath: select_action(): force udev reload for > > uninitialized maps > > cb10d38 multipathd: uev_trigger(): handle incomplete ADD events > > > > addressed the same issue, but incompletely. They would miss cases > > where the > > map was configured correctly but none of the RELOAD criteria were > > met. > > This patch partially reverts 2b25a9e by converting > > select_reload_action() into > > a trivial helper. Instead, we now check for incompletely > > initialized udev now > > before checking any of the other reload criteria. > > I'll review this patch tomorrow, but are you able to reproduce this? Not me, but multiple customers of ours :-/ Most of them where running PowperPC, for reasons I can only speculate about. The user-visible phenomenon is that some upper layers on some maps (kpartx-created partitions, LVM, ...) are not present after boot, and "multipathd reload" fixes the situation. I suppose it should be reproducible if one has multiple multipath maps with partitions, devices are discovered somewhat slowly / with delays during intird processing, and the root device is discovered early on. Then systemd has enough time to mount the root FS and stop services before all maps are completely set up. The exact behavior would still depend on timing (but in the in the last case I worked on, it was 100% reproducible by the customer). > I've seen something similar, except in the case I saw, multipathd > took > too long during the initial configuration, and the systemd shut > things > down for the switch-root before multipath could finish creating the > devices. I was thinking of trying to solve cases like this by forcing > some ordering on multipathd stopping in the initramfs, with something > like > > Before=initrd-cleanup.service > Conflicts=initrd-cleanup.service > > for the multipathd.service file for the initramfs. The goal is to > make > sure that things don't get shutdown until multipathd has stopped. > This > would keep multipath from creating devices when udev isn't around to > deal with them. Did you try something like this? No, I didn't think of that. It's an interesting idea, although it might slow down booting. IMO it's actually a good thing that the services in the initrd are stopped quickly when the root FS becomes available. It can just have some side effects our current code doesn't deal well with. AFAICS, multipathd is not the problem, udev is. I can see that multipathd has cleanly set up the maps, but udev has been stopped before processing the respective events (in particular, the change events). If this happens, the udev db for the affected maps looks more or less like this: DM_UDEV_DISABLE_SUBSYSTEM_RULES_FLAG="1" DM_UDEV_DISABLE_DISK_RULES_FLAG="1" DM_UDEV_DISABLE_OTHER_RULES_FLAG="1" The device-mapper hotplug mechanism doesn't help here, because it tries to import properties from the db. Triggering uevents in other ways helps even less. Only a "genuine" change event without the "reload" flag (DM_SUBSYSTEM_UDEV_FLAG0) set will do the trick. When multipathd starts after switching root, it sees the maps in perfect state as far as multipath properties are concerned, and thus will not set the force_udev_reload flag. This patch changes that. (My break-through step when I attempted to understand the issue was to tell customers to tar up /run/udev/data before starting coldplug. That way I could see the state in which the udev db was left when initrd finished, and I saw the half-completed entries like above). I suppose this works differently on Red Hat where you use mpathconf and set up only a very limited set of maps during initrd processing. Therefore my guess was you'd not see this at all. I'm still wondering why we have been seeing it only very recently. *Perhaps* my recent changes to make multipathd shutdown more quickly are part of the equation, I'm unsure about that. I am pretty positive that this patch is effective, though. Regards, Martin -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel