Re: lvmpolld causes high cpu load issue

Zdenek Kabelac <zdenek.kabelac@xxxxxxxxx> · Wed, 17 Aug 2022 14:54:46 +0200

Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):
On Wed, 2022-08-17 at 18:47 +0800, Heming Zhao wrote:
On Wed, Aug 17, 2022 at 11:46:16AM +0200, Zdenek Kabelac wrote:

ATM I'm not even sure if you are complaining about how CPU usage of
lvmpolld
or just huge udev rules processing overhead.

The load is generated by multipath. lvmpolld does the IN_CLOSE_WRITE
action
which is the trigger.

Let's be clear here: every close-after-write operation triggers udev's
"watch" mechanism for block devices, which causes the udev rules to be
executed for the device. That is not a cheap operation. In the case at
hand, the customer was observing a lot of "multipath -U" commands. So
apparently a significant part of the udev rule processing was spent in
"multipath -U". Running "multipath -U" is important, because the rule
could have been triggered by a change of the number of available paths
devices, and later commands run from udev rules might hang indefinitely
if the multipath device had no usable paths any more. "multipath -U" is
already quite well optimized, but it needs to do some I/O to complete
it's work, thus it takes a few milliseconds to run.

IOW, it would be misleading to point at multipath. close-after-write
operations on block devices should be avoided if possible. As you
probably know, the purpose udev's "watch" operation is to be able to
determine changes on layered devices, e.g. newly created LVs or the
like. "pvmove" is special, because by definition it will usually not
cause any changes in higher layers. Therefore it might make sense to
disable the udev watch on the affected PVs while pvmove is running, and
trigger a single change event (re-enabling the watch) after the pvmove
has finished. If that is impossible, lvmpolld and other lvm tools that
are involved in the pvmove operation should avoid calling close() on
the PVs, IOW keep the fds open until the operation is finished.

Hi

Let's make clear we are very well aware of all the constrains associated with 
udev rule logic  (and we tried quite hard to minimize impact - however udevd 
developers kind of 'misunderstood'  how badly they will be impacting system's 
performance with the existing watch rule logic - and the story kind of 
'continues' with  'systemd's' & dBus services unfortunatelly...

However let's focus on 'pvmove' as it is potentially very lengthy operation - 
so it's not feasible to keep the  VG locked/blocked  across an operation which 
might take even days with slower storage and big moved sizes (write 
access/lock disables all readers...)

So the lvm2 does try to minimize locking time. We will re validate whether 
just necessary  'vg updating' operation are using 'write' access - since 
occasionally due to some unrelated code changes it might eventually result 
sometimes in unwanted 'write' VG open - but we can't keep the operation 
blocking  a whole VG because of slow udev rule processing.

In normal circumstances udev rule should be processed very fast - unless there 
is something mis-designe causing a CPU overloading.

But as mentioned already few times - without more knowledge about the case we 
could hardly guess exact reasoning.  But we already provided useful suggestion 
how to reduce number of 'processed' device by udev by reduction of 'lvm2 
metadata PVs'  - the big reason for frequent metadata upsate would be a big 
segmentation of LV - but this we will not know without seeing user's 
'metadata' of a VG in this case...

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/