On Wed, Feb 12, 2020 at 05:47:31PM -0500, Daniel Jordan wrote: > padata has been undergoing some surgery over the last year[0] and now seems > ready for another enhancement: splitting up and multithreading CPU-intensive > kernel work. > > Quoting from an earlier series[1], the problem I'm trying to solve is > > A single CPU can spend an excessive amount of time in the kernel operating > on large amounts of data. Often these situations arise during initialization- > and destruction-related tasks, where the data involved scales with system > size. These long-running jobs can slow startup and shutdown of applications > and the system itself while extra CPUs sit idle. > > Here are the current consumers: > > - struct page init (boot, hotplug, pmem) > - VFIO page pinning (kvm guest init) > - fallocating a hugetlb file (database shared memory init) > > On a large-memory server, DRAM page init is ~23% of kernel boot (3.5s/15.2s), > and it takes over a minute to start a VFIO-enabled kvm guest or fallocate a > hugetlb file that occupy a significant fraction of memory. This work results > in 7-20x speedups and is currently increasing the uptime of our production > kernels. > > Future areas include munmap/exit, umount, and __ib_umem_release. Some of these > need coarse locks broken up for multithreading (zone->lock, lru_lock). I'm aware of this ib_umem_release request, it would be interesting to see, the main workload here is put_page and dma_unmap Jason