RFD: multithreading in padata

Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> · Tue, 3 Dec 2019 10:58:41 -0500

[resending in modified form since this didn't seem to reach the lists]

Hi,

padata has been undergoing some surgery lately[0] and now seems ready for
another enhancement: splitting up and multithreading CPU-intensive kernel work.

I'm planning to send an RFC for this, but I wanted to post some thoughts on the
design and a work-in-progress branch first to see if the direction looks ok.

Quoting from an earlier series[1], this is the problem I'm trying to solve:

  A single CPU can spend an excessive amount of time in the kernel operating
  on large amounts of data.  Often these situations arise during initialization-
  and destruction-related tasks, where the data involved scales with system
  size.  These long-running jobs can slow startup and shutdown of applications
  and the system itself while extra CPUs sit idle.

There are several paths where this problem exists, but here are three to start:

 - struct page initialization (at boot-time, during memory hotplug, and for
   persistent memory)
 - VFIO page pinning (kvm guest initialization)
 - fallocating a HugeTLB page (database initialization)

padata is a general mechanism for parallel work and so seems natural for this
functionality[2], but now it can only manage a series of small, ordered jobs.

The coming RFC will bring enhancements to split up a large job among a set of
helper threads according to the user's wishes, load balance the work between
them, set concurrency limits to control the overall number of helpers in the
system and per NUMA node, and run extra helper threads beyond the first at a
low priority level so as not to disturb other activity on the system for the
sake of an optimization.  (While extra helpers are run at low priority for most
of the job, their priority is raised one by one at job end to match the
caller's to avoid starvation on a busy system.)

The existing padata interfaces and features will remain, but serialization
becomes optional because these sorts of jobs don't need it.

The advantage to enhancing padata rather than having the multithreading stand
alone is that there would be one central place in the kernel to manage the
number of helper threads that run at any given time.  A machine's idle CPU
resources can be harnessed yet controlled (the low priority idea) to provide
the right amount of multithreading for the system.

Here's a work-in-progress branch with some of this already done in the last
five patches.

    git://oss.oracle.com/git/linux-dmjordan.git padata-mt-wip
    https://oss.oracle.com/git/gitweb.cgi?p=linux-dmjordan.git;a=shortlog;h=refs/heads/padata-mt-wip

Thoughts?  Questions?

Thanks.

Daniel

[0] https://lore.kernel.org/linux-crypto/?q=s%3Apadata+d%3A20190101..
[1] https://lore.kernel.org/lkml/20181105165558.11698-1-daniel.m.jordan@xxxxxxxxxx/
[2] https://lore.kernel.org/lkml/20181107103554.GL9781@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/