Hello everybody, I CC'ed Oleksandr Natalenko as he is the initial reporter of the Arch Linux issue. I can not reproduce it myself. Martin Wilck <mwilck@xxxxxxxx> on Wed, 2021/02/17 09:22: > We have to differentiate here. In our case we had to wait for "systemd- > udev-settle.service". In the arch case, it was only necessary to wait > for systemd-udevd.service itself. "After=systemd-udevd.service" just > means that the daemon is up, it says nothing about any device > initialization being completed. To quote from systemd.unit(5) about `Before=` and `After=`: Those two settings configure ordering dependencies between units. If unit foo.service contains the setting Before=bar.service and both units are being started, bar.service's start-up is delayed until foo.service has finished starting up. After= is the inverse of Before=, i.e. while Before= ensures that the configured unit is started before the listed unit begins starting up, After= ensures the opposite, that the listed unit is fully started up before the configured unit is started. Let's keep this in mind. Now let's have a look at udevd startup: It signals being ready by calling sd_notifyf(), but it loads rules and applies permissions before doing so [0]. Even before we have some code about handling events and monitoring stuff. So I guess pvscan is started in initialization phase before udevd signals being ready. And obviously there is any kind of race condition. With the ordering "After=" in `lvm2-pvscan@.service` the service start is queued at initialization phase, but actual start and pvscan execution is delayed until udevd signaled being ready. > But in general, I think this needs deeper analysis. Looking at > https://bugs.archlinux.org/task/69611, the workaround appears to have > been found simply by drawing an analogy to a previous similar case. > I'd like to understand what happened on the arch system when the error > occured, and why this simple ordering directive avoided it. As said I can not reproduce it myself... Oleksandr, can you give more details? Possibly everything from journal regarding systemd-udevd.service (and systemd-udevd.socket) and lvm2-pvscan@*.service could help. > 1. How had the offending pvscan process been started? I'd expect that > "pvscan" (unlike "lvm monitor" in our case) was started by an udev > rule. If udevd hadn't started yet, how would that udev rule have be > executed? OTOH, if pvscan had not been started by udev but by another > systemd service, than *that* service would probably need to get the > After=systemd-udevd.service directive. To my understanding it was started from udevd by a rule in `69-dm-lvm-metad.rules`. (BTW, renaming that rule file may make sense now that lvm2-metad is gone...) > 2. Even without the "After=" directive, I'd assume that pvscan wasn't > started "before" systemd-udevd, but rather "simultaneously" (i.e. in > the same systemd transaction). Thus systemd-udevd should have started > up while pvscan was running, and pvscan should have noticed that udevd > eventually became available. Why did pvscan time out? What was it > waiting for? We know that lvm checks for the existence of > "/run/udev/control", but that should have become avaiable after some > fractions of a second of waiting. I do not think there is anything starting pvscan before udevd. [0] https://github.com/systemd/systemd/blob/main/src/udev/udevd.c#L1807 -- main(a){char*c=/* Schoene Gruesse */"B?IJj;MEH" "CX:;",b;for(a/* Best regards my address: */=0;b=c[a++];) putchar(b-1/(/* Chris cc -ox -xc - && ./x */b/42*2-3)*42);}
Attachment:
pgpWUwQL1WYzX.pgp
Description: OpenPGP digital signature
_______________________________________________ linux-lvm mailing list linux-lvm@xxxxxxxxxxxxxxxxxx https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/