Re: [LSF/MM/BPF TOPIC] kernel multithreading with padata

Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> · Thu, 13 Feb 2020 11:13:07 -0500

On Wed, Feb 12, 2020 at 07:31:00PM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 12, 2020 at 05:47:31PM -0500, Daniel Jordan wrote:
> > padata has been undergoing some surgery over the last year[0] and now seems
> > ready for another enhancement: splitting up and multithreading CPU-intensive
> > kernel work.
> > 
> > Quoting from an earlier series[1], the problem I'm trying to solve is
> > 
> >   A single CPU can spend an excessive amount of time in the kernel operating
> >   on large amounts of data.  Often these situations arise during initialization-
> >   and destruction-related tasks, where the data involved scales with system
> >   size.  These long-running jobs can slow startup and shutdown of applications
> >   and the system itself while extra CPUs sit idle.
> > 
> > Here are the current consumers:
> > 
> >  - struct page init (boot, hotplug, pmem)
> >  - VFIO page pinning (kvm guest init)
> >  - fallocating a hugetlb file (database shared memory init)
> > 
> > On a large-memory server, DRAM page init is ~23% of kernel boot (3.5s/15.2s),
> > and it takes over a minute to start a VFIO-enabled kvm guest or fallocate a
> > hugetlb file that occupy a significant fraction of memory.  This work results
> > in 7-20x speedups and is currently increasing the uptime of our production
> > kernels.
> > 
> > Future areas include munmap/exit, umount, and __ib_umem_release.  Some of these
> > need coarse locks broken up for multithreading (zone->lock, lru_lock).
> 
> I'm aware of this ib_umem_release request, it would be interesting to
> see, the main workload here is put_page and dma_unmap

Ah yes, I see it gets all the way down to zone->lock, so I should've said _all_
of the future cases need coarse locks broken.

By the way, there's an idea for dealing with zone->lock that I haven't yet had
time to look at.

http://lkml.kernel.org/r/20181018111632.GM5819@xxxxxxxxxxxxxxxxxxx