On 6/9/21 12:18 AM, heming.zhao@xxxxxxxx wrote:
On 6/8/21 5:30 AM, David Teigland wrote:
On Mon, Jun 07, 2021 at 10:27:20AM +0000, Martin Wilck wrote:
Most importantly, this was about LVM2 scanning of physical volumes. The
number of udev workers has very little influence on PV scanning,
because the udev rules only activate systemd service. The actual
scanning takes place in lvm2-pvscan@.service. And unlike udev, there's
no limit for the number of instances of a given systemd service
template that can run at any given time.
Excessive device scanning has been the historical problem in this area,
but Heming mentioned dev_cache_scan() specifically as a problem. That was
surprising to me since it doesn't scan/read devices, it just creates a
list of device names on the system (either readdir in /dev or udev
listing.) If there are still problems with excessive scannning/reading,
we'll need some more diagnosis of what's happening, there could be some
cases we've missed.
the dev_cache_scan doesn't have direct disk IOs, but libudev will scan/read
udev db which issue real disk IOs (location is /run/udev/data).
we can see with combination "obtain_device_list_from_udev=0 &
event_activation=1" could largely reduce booting time from 2min6s to 40s.
the key is dev_cache_scan() does the scan device by itself (scaning "/dev").
I am not very familiar with systemd-udev, below shows a little more info
about libudev path. the top function is _insert_udev_dir, this function:
1. scans/reads /sys/class/block/. O(n)
2. scans/reads udev db (/run/udev/data). may O(n)
udev will call device_read_db => handle_db_line to handle every
line of a db file.
3. does qsort & deduplication the devices list. O(n) + O(n)
4. has lots of "memory alloc" & "string copy" actions during working.
it takes too much memory, from the host side, use 'top' can see:
- direct activation only used 2G memory during boot
- event activation cost ~20G memory.
I didn't test the related udev code, and guess the <2> takes too much time.
And there are thousand scanning job parallel in /run/udev/data, meanwhile
there are many devices need to generate udev db file in the same dir. I am
not sure if the filesystem can perfect handle this scenario.
the another code path, obtain_device_list_from_udev=0, which triggers to
scan/read "/dev", this dir has less write IOs than /run/udev/data.
Regards
heming
I made a minor mistake: above <3> qsort time is O(logn).
More info about my analysis:
I set filter in lvm.conf, the rule: filter = [ "a|/dev/vda2|", "r|.*|" ]
the booting time reduced a little, from 2min 6s to 1min 42s.
The vm vda2 layout:
# lsblk | egrep -A 4 "^vd"
vda 253:0 0 40G 0 disk
├─vda1 253:1 0 8M 0 part
└─vda2 253:2 0 40G 0 part
├─system-swap 254:0 0 2G 0 lvm [SWAP]
└─system-root 254:1 0 35G 0 lvm /
the filter rule denies all the LVs except rootfs LVs.
the rule makes _pvscan_cache_args() to remove dev from devl->list by nodata filters.
the hot spot narrow to setup_devices (calling dev_cache_scan()).
_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/