Re: Discussion: performance issue on event activation mode

Peter Rajnoha <prajnoha@xxxxxxxxxx> · Thu, 30 Sep 2021 13:29:07 +0200

On 9/30/21 09:51, Martin Wilck wrote:
On Thu, 2021-09-30 at 00:06 +0200, Peter Rajnoha wrote:
On Tue 28 Sep 2021 12:42, Benjamin Marzinski wrote:
On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
I have pondered this quite a bit, but I can't say I have a
concrete
plan.

To avoid depending on "udev settle", multipathd needs to
partially
revert to udev-independent device detection. At least during
initial
startup, we may encounter multipath maps with members that don't
exist
in the udev db, and we need to deal with this situation
gracefully. We
currently don't, and it's a tough problem to solve cleanly. Not
relying
on udev opens up a Pandora's box wrt WWID determination, for
example.
Any such change would without doubt carry a large risk of
regressions
in some scenarios, which we wouldn't want to happen in our large
customer's data centers.

I'm not actually sure that it's as bad as all that. We just may
need a
way for multipathd to detect if the coldplug has happened.  I'm
sure if
we say we need it to remove the udev settle, we can get some method
to
check this. Perhaps there is one already, that I don't know about.
If

The coldplug events are synthesized and as such, they all now contain
SYNTH_UUID=<UUID> key-value pair with kernel>=4.13:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/sysfs-uevent

I've already tried to proposee a patch for systemd/udev that would
mark
all uevents coming from the trigger (including the one used at boot
for
coldplug) with an extra key-value pair that we could easily match in
rules,
but that was not accepted. So right now, we could detect that
synthesized uevent happened, though we can't be sure it was the
actual
udev trigger at boot. For that, we'd need the extra marks. I can give
it
another try though, maybe if there are more people asking for this
functionality, we'll be at better position for this to be accepted.

That would allow us to discern synthetic events, but I'm unsure how
this what help us. Here, what matters is to figure out when we don't
expect any more of them to arrive.

I think this would require different approach on systemd/udev side. Currently, 
"udevadm trigger --setlle" uses different UUID for each synthesized uevent's 
SYNTH_UUID. This is actually not exactly how it was meant to be used. Instead, 
the SYNTH_UUID was also meant to be used as form of grouping - so in case of 
"udevadm trigger", there should be a single UUID used to group all the 
generated uevents based on that UUID. Then, this logic could be enhanced in a 
way that there would be different SYNTH_UUID used for each subsystem (e.g. 
block), hence we could wait for each subsystem's devices separately, not being 
dragged by waiting for anything else.

So then we could have services like:
  systemd-udev-settle-block.service
  systemd-udev-settle-othersubsystem.service
  ...

And then place our services after that. We'd need to elaborate a bit if more 
fine grained separation would be needed or not...

If we see this udev settle as the key point, then I think we should probably 
concentrate on enhancing systemd/udev to provide this functionality (and 
primarily the udevadm trigger functionality and waiting for related 
synthesized events). I think the infrastructure to accomplish this is already 
there. It just needs suitable user-space changes (the udevadm trigger).

I guess it would be possible to compare the list of (interesting)
devices in sysfs with the list of devices in the udev db. For
multipathd, we could

  - scan set U of udev devices on startup
  - scan set S of sysfs devices on startup

Well, I think that's exactly the functionality that could be provided by the 
settle separation as described above... And then everybody could benefit from 
this.

  - listen for uevents for updating both S and U
  - after each uevent, check if the difference set of S and U is emtpy
  - if yes, coldplug has finished
  - otherwise, continue waiting, possibly until some timeout expires.

It's more difficult for LVM because you have no daemon maintaining
state.

Martin

--
Peter

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/