Re: Discussion: performance issue on event activation mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 28, 2021 at 03:16:08PM +0000, Martin Wilck wrote:
> On Tue, 2021-09-28 at 09:42 -0500, David Teigland wrote:
> > On Tue, Sep 28, 2021 at 06:34:06AM +0000, Martin Wilck wrote:
> > > Hello David and Peter,
> > > 
> > > On Mon, 2021-09-27 at 10:38 -0500, David Teigland wrote:
> > > > On Mon, Sep 27, 2021 at 12:00:32PM +0200, Peter Rajnoha wrote:
> > > > > > - We could use the new lvm-activate-* services to replace the
> > > > > > activation
> > > > > > generator when lvm.conf event_activation=0.  This would be
> > > > > > done by
> > > > > > simply
> > > > > > not creating the event-activation-on file when
> > > > > > event_activation=0.
> > > > > 
> > > > > ...the issue I see here is around the systemd-udev-settle:
> > > > 
> > > > Thanks, I have a couple questions about the udev-settle to
> > > > understand
> > > > that
> > > > better, although it seems we may not need it.
> > > > 
> > > > >   - the setup where lvm-activate-vgs*.service are always there
> > > > > (not
> > > > >     generated only on event_activation=0 as it was before with
> > > > > the
> > > > >     original lvm2-activation-*.service) practically means we
> > > > > always
> > > > >     make a dependency on systemd-udev-settle.service, which we
> > > > > shouldn't
> > > > >     do in case we have event_activation=1.
> > > > 
> > > > Why wouldn't the event_activation=1 case want a dependency on
> > > > udev-
> > > > settle?
> > > 
> > > You said it should wait for multipathd, which in turn waits for
> > > udev
> > > settle. And indeed it makes some sense. After all: the idea was to
> > > avoid locking issues or general resource starvation during uevent
> > > storms, which typically occur in the coldplug phase, and for which
> > > the
> > > completion of "udev settle" is the best available indicator.
> > 
> > Hi Martin, thanks, you have some interesting details here.
> > 
> > Right, the idea is for lvm-activate-vgs-last to wait for other
> > services
> > like multipath (or anything else that a PV would typically sit on),
> > so
> > that it will be able to activate as many VGs as it can that are
> > present at
> > startup.  And we avoid responding to individual coldplug events for
> > PVs,
> > saving time/effort/etc.
> > 
> > > I'm arguing against it (perhaps you want to join in :-), but odds
> > > are
> > > that it'll disappear sooner or later. Fot the time being, I don't
> > > see a
> > > good alternative.
> > 
> > multipath has more complex udev dependencies, I'll be interested to
> > see
> > how you manage to reduce those, since I've been reducing/isolating
> > our
> > udev usage also.
> 
> I have pondered this quite a bit, but I can't say I have a concrete
> plan.
> 
> To avoid depending on "udev settle", multipathd needs to partially
> revert to udev-independent device detection. At least during initial
> startup, we may encounter multipath maps with members that don't exist
> in the udev db, and we need to deal with this situation gracefully. We
> currently don't, and it's a tough problem to solve cleanly. Not relying
> on udev opens up a Pandora's box wrt WWID determination, for example.
> Any such change would without doubt carry a large risk of regressions
> in some scenarios, which we wouldn't want to happen in our large
> customer's data centers.

I'm not actually sure that it's as bad as all that. We just may need a
way for multipathd to detect if the coldplug has happened.  I'm sure if
we say we need it to remove the udev settle, we can get some method to
check this. Perhaps there is one already, that I don't know about. If
multipathd starts up and the coldplug hasn't happened, we can just
assume the existing devices are correct, and set up the paths enough to
check them, until we are notified that the coldplug has finished. Then
we just run reconfigure, and continue along like everything currently
is.  The basic idea it to have multipathd run in mode where its only
concern is monitoring the paths of the existing devices, until we're
notified that the coldplug has completed. The important thing would be
to make sure that we can't accidentally miss the notification that the
coldplug has completed. But we could always time out if it takes too
long, and we haven't gotten any uevents recently.
 
> I also looked into Lennart's "storage daemon" concept where multipathd
> would continue running over the initramfs/rootfs switch, but that would
> be yet another step with even higher risk.

This is the "set argv[0][0] = '@' to disble initramfs daemon killing"
concept, right? We still have the problem where the udev database gets
cleared, so if we ever need to look at that while processing the
coldplug events, we'll have problems.

> > 
> > > The dependency type you have to use depends on what you need. Do
> > > you
> > > really only depend on udev settle because of multipathd? I don't
> > > think
> > > so; even without multipath, thousands of PVs being probed
> > > simultaneously can bring the performance of parallel pvscans down.
> > > That
> > > was the original motivation for this discussion, after all. If this
> > > is
> > > so, you should use both "Wants" and "After". Otherwise, using only
> > > "After" might be sufficient.
> > 
> > I don't think we really need the settle.  If device nodes for PVs are
> > present, then vgchange -aay from lvm-activate-vgs* will see them and
> > activate VGs from them, regardless of what udev has or hasn't done
> > with
> > them yet.
> 
> Hm. This would mean that the switch to event-based PV detection could
> happen before "udev settle" ends. A coldplug storm of uevents could
> create 1000s of PVs in a blink after event-based detection was enabled.
> Wouldn't that resurrect the performance issues that you are trying to
> fix with this patch set?
> 
> > 
> > > > - Reading the udev db: with the default
> > > > external_device_info_source=none
> > > > we no longer ask the udev db for any info about devs.  (We now
> > > > follow that setting strictly, and only ask udev when
> > > > source=udev.)
> > > 
> > > This is a different discussion, but if you don't ask udev, how do
> > > you
> > > determine (reliably, and consistently with other services) whether
> > > a
> > > given device will be part of a multipath device or a MD Raid
> > > member?
> > 
> > Firstly, with the new devices file, only the actual md/mpath device
> > will
> > be in the devices file, the components will not be, so lvm will never
> > attempt to look at an md or mpath component device.
> 
> I have to look more closely into the devices file and how it's created
> and used. 
> 
> > Otherwise, when the devices file is not used,
> > md: from reading the md headers from the disk
> > mpath: from reading sysfs links and /etc/multipath/wwids
> 
> Ugh. Reading sysfs links means that you're indirectly depending on
> udev, because udev creates those. It's *more* fragile than calling into
> libudev directly, IMO. Using /etc/multipath/wwids is plain wrong in
> general. It works only on distros that use "find_multipaths strict",
> like RHEL. Not to mention that the path can be customized in
> multipath.conf.

I admit that a wwid being in the wwids file doesn't mean that it is
definitely a multipath path device (it could always still be blacklisted
for instance). Also, the ability to move the wwids file is unfortunate,
and probably never used. But it is the case that every wwid in the wwids
file has had a multipath device successfully created for it. This is
true regardless of the find_multipaths setting, and seems to me to be a
good hint. Conversely, if a device wwid isn't in the wwids file, then it
very likely has never been multipathed before (assuming that the wwids
file is on a writable filesystem).

So relying on it being correct is wrong, but it certainly provides
useful hints.

> > 
> > > In the past, there were issues with either pvscan or blkid (or
> > > multipath) failing to open a device while another process had
> > > opened it
> > > exclusively. I've never understood all the subtleties. See systemd
> > > commit 3ebdb81 ("udev: serialize/synchronize block device event
> > > handling with file locks").
> > 
> > Those locks look like a fine solution if a problem comes up like
> > that.
> > I suspect the old issues may have been caused by a program using an
> > exclusive open when it shouldn't.
> 
> Possible. I haven't seen many of these issues recently. Very rarely, I
> see reports of a mount command mysteriously, sporadically failing
> during boot. It's very hard to figure out why that happens if it does.
> I suspect some transient effect of this kind.
> 
> > 
> > > After=udev-settle will make sure that you're past a coldplug uevent
> > > storm during boot. IMO this is the most important part of the
> > > equation.
> > > I'd be happy to find a solution for this that doesn't rely on udev
> > > settle, but I don't see any.
> > 
> > I don't think multipathd is listening to uevents directly?
> >   If it were,
> > you might use a heuristic to detect a change in uevents (e.g. the
> > volume)
> > and conclude coldplug is finished.
> 
> multipathd does listen to uevents (only "udev" events, not "kernel").
> But that doesn't help us on startup. Currently we try hard to start up
> after coldplug is finished. multipathd doesn't have a concurrency issue
> like LVM2 (at least I hope so; it handles events with just two threads,
> a producer and a consumer). The problem is rather that dm devices
> survive the initramfs->rootfs switch, while member devices don't (see
> above).
> 
> Cheers,
> Martin
> 
> 
> > 
> > Dave
> > 

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/





[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux