Re: lvmpolld causes IO performance issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dne 16. 08. 22 v 11:28 Heming Zhao napsal(a):
Hello maintainers & list,

I bring a story:
One SUSE customer suffered lvmpolld issue, which cause IO performance dramatic
decrease.

How to trigger:
When machine connects large number of LUNs (eg 80~200), pvmove (eg, move a single
disk to a new one, cmd like: pvmove disk1 disk2), the system will suffer high
cpu load. But when system connects ~10 LUNs, the performance is fine.

We found two work arounds:
1. set lvm.conf 'activation/polling_interval=120'.
2. write a speical udev rule, which make udev ignore the event for mpath devices.
    echo 'ENV{DM_UUID}=="mpath-*", OPTIONS+="nowatch"' >\
     /etc/udev/rules.d/90-dm-watch.rules

Run above any one of two can make the performance issue disappear.

** the root cause **

lvmpolld will do interval requeset info job for updating the pvmove status

On every polling_interval time, lvm2 will update vg metadata. The update job will
call sys_close, which will trigger systemd-udevd IN_CLOSE_WRITE event, eg:
   2022-<time>-xxx <hostname> systemd-udevd[pid]: dm-179: Inotify event: 8 for /dev/dm-179
(8 is IN_CLOSE_WRITE.)

These VGs underlying devices are multipath devices. So when lvm2 update metatdata,
even if pvmove write a few data, the sys_close action trigger udev's "watch"
mechanism to gets notified frequently about a process that has written to the
device and closed it. This causes frequent, pointless re-evaluation of the udev
rules for these devices.

My question: Does LVM2 maintainers have any idea to fix this bug?

In my view, does lvm2 could drop VGs devices fds until pvmove finish?

Hi

Please provide more info about lvm2 metadata and also some 'lvs -avvvvv' trace so we can get better picture about the layout - also version of lvm2,systemd,kernel in use.

pvmove is progressing by mirroring each segment of an LV - so if there would be a lot of segments - then each such update may trigger udev watch rule event.

But ATM I could hardly imagine how this could cause some 'dramatic' performance decrease - maybe there is something wrong with udev rules on the system ?

What is the actual impact ?

Note - pvmove was never designed as a high performance operation (in fact it tries to not eat all the disk bandwidth as such)


Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/




[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux