Re: lvmpolld causes high cpu load issue

Martin Wilck <martin.wilck@xxxxxxxx> · Wed, 17 Aug 2022 13:41:17 +0000

On Wed, 2022-08-17 at 14:54 +0200, Zdenek Kabelac wrote:
> Dne 17. 08. 22 v 14:39 Martin Wilck napsal(a):
> 
> 
> Let's make clear we are very well aware of all the constrains
> associated with 
> udev rule logic  (and we tried quite hard to minimize impact -
> however udevd 
> developers kind of 'misunderstood'  how badly they will be impacting
> system's 
> performance with the existing watch rule logic - and the story kind
> of 
> 'continues' with  'systemd's' & dBus services unfortunatelly...

I dimly remember you dislike udev ;-)

I like the general idea of the udev watch. It is the magic that causes
newly created partitions to magically appear in the system, which is
very convenient for users and wouldn't work otherwise. I can see that
it might be inappropriate for LVM PVs. We can discuss changing the
rules such that the watch is disabled for LVM devices (both PV and LV).
I don't claim to overlook all possible side effects, but it might be
worth a try. It would mean that newly created LVs, LV size changes etc.
would not be visible in the system immediately. I suppose you could
work around that in the LVM tools by triggering change events after
operations like lvcreate.

> However let's focus on 'pvmove' as it is potentially very lengthy
> operation - 
> so it's not feasible to keep the  VG locked/blocked  across an
> operation which 
> might take even days with slower storage and big moved sizes (write 
> access/lock disables all readers...)

So these close-after-write operations are caused by locking/unlocking
the PVs?

Note: We were observing that watch events were triggered every 30s, for
every PV, simultaneously. (@Heming correct me if I'mn wrong here).

> So the lvm2 does try to minimize locking time. We will re validate
> whether 
> just necessary  'vg updating' operation are using 'write' access -
> since 
> occasionally due to some unrelated code changes it might eventually
> result 
> sometimes in unwanted 'write' VG open - but we can't keep the
> operation 
> blocking  a whole VG because of slow udev rule processing.

> In normal circumstances udev rule should be processed very fast -
> unless there 
> is something mis-designe causing a CPU overloading.
> 

IIRC there is no evidence that the udev rules are really processed
"slowly". udev isn't efficient, a run time in the order 10 ms is
expected for a worker. We tried different tracing approaches, but we
never saw "multipath -U" hanging on a lock or a resource shortage. It
seems be the sheer amount of events and processes that is causing
trouble. The customer had a very lengthy "multipath.conf" file (~50k
lines), which needs to be parsed by every new multipath instance; that
was slowing things down somewhat. Still the runtime of "multipath -U"
would be no more than 100ms, AFAICT.

Martin

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/