Dne 28. 01. 22 v 17:02 Martin Wilck napsal(a):
On Fri, 2022-01-28 at 16:57 +0100, Martin Wilck wrote:
It's a race condition. It probably happens while multipathd is
reloading a map (*), suspending it during the table reload. The
device
will be resumed a few fractions of a second later (so yes, it's
"quick"), but then it's too late
More precisely: The suspend state itself may actually not last longer
than a few ms. But once the symlink is bent to point to the wrong
device, it will remain so, until the CHANGE event following the device
resume is successfully processed by udev, which may happen
substantially later. And it's that longer time span which matters for
systemd's mount attempt (or LVM device activation, for that matter).
This looks like you are trying to mask-out different synchronization bug.
Also it's worth to note - using symlinks is somewhat doomed on its own.
So you only solve a very minor subcase where you manage to 'hit' your race
just in a moment where you device is suspend and you actually 'scan' state of
device.
But what happen - if device would happen to be already resumed ?
It looks like there is some race in udev rules processing - just somewhere else.
I think Peter could more enlighten the lvm2 logic - but it seems there is
possibly missing similar logic on multipath side in the moment when devices
are created ?
There should be no race when switching from ramdisk to rootfs.
Regards
Zdenek
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel