On Wed, 2021-02-17 at 13:03 +0100, Christian Hesse wrote: > > Let's keep this in mind. Now let's have a look at udevd startup: It > signals > being ready by calling sd_notifyf(), but it loads rules and applies > permissions before doing so [0]. > Even before we have some code about handling events and monitoring > stuff. It loads the rules, but events will only be processed after entering sd_event_loop(), which happens after the sd_notify() call. Anyway, booting the system with "udev.log-priority=debug" might provide further insight. Oleksandr, could you try that (without the After= directive)? > So I guess pvscan is started in initialization phase before udevd > signals > being ready. And obviously there is any kind of race condition. Right. Some uevent might arrive between the creation of the monitor socket in monitor_new() and entering the event loop. Such event would be handled immediately, and possibly before systemd receives the sd_notify message, so a race condition looks possible. > > With the ordering "After=" in `lvm2-pvscan@.service` the service > start is > queued at initialization phase, but actual start and pvscan execution > is > delayed until udevd signaled being ready. > > > But in general, I think this needs deeper analysis. Looking at > > https://bugs.archlinux.org/task/69611, the workaround appears to > > have > > been found simply by drawing an analogy to a previous similar case. > > I'd like to understand what happened on the arch system when the > > error > > occured, and why this simple ordering directive avoided it. > > As said I can not reproduce it myself... Oleksandr, can you give more > details? > Possibly everything from journal regarding systemd-udevd.service (and > systemd-udevd.socket) and lvm2-pvscan@*.service could help. > > > 1. How had the offending pvscan process been started? I'd expect > > that > > "pvscan" (unlike "lvm monitor" in our case) was started by an udev > > rule. If udevd hadn't started yet, how would that udev rule have be > > executed? OTOH, if pvscan had not been started by udev but by > > another > > systemd service, than *that* service would probably need to get the > > After=systemd-udevd.service directive. > > To my understanding it was started from udevd by a rule in > `69-dm-lvm-metad.rules`. > > (BTW, renaming that rule file may make sense now that lvm2-metad is > gone...) > > > 2. Even without the "After=" directive, I'd assume that pvscan > > wasn't > > started "before" systemd-udevd, but rather "simultaneously" (i.e. > > in > > the same systemd transaction). Thus systemd-udevd should have > > started > > up while pvscan was running, and pvscan should have noticed that > > udevd > > eventually became available. Why did pvscan time out? What was it > > waiting for? We know that lvm checks for the existence of > > "/run/udev/control", but that should have become avaiable after > > some > > fractions of a second of waiting. > > I do not think there is anything starting pvscan before udevd. I agree. The race described above looks at least possible. I would go one step further and say that *every* systemd service that might be started from an udev rule should have an "After=systemd- udevd.service". Martin _______________________________________________ linux-lvm mailing list linux-lvm@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/