Optimistic parallelization can go wrong if too many helpers are started on a busy system. They can unfairly degrade the performance of other tasks, so they should be sensitive to current CPU utilization[1]. Achieve this by running helpers at MAX_NICE so that their CPU time is proportional to idle CPU time. The main thread, however, runs at its original priority so that it can make progress on a heavily loaded system, as it would if padata were not in the picture. Here are two test cases in which a padata and a non-padata workload compete for the same CPUs to show that normal priority (i.e. nice=0) padata helpers cause the non-padata workload to run more slowly, whereas MAX_NICE padata helpers don't. Notes: - Each case was run using 8 CPUs on a large two-socket server, with a cpumask allowing all test threads to run anywhere within the 8. - The non-padata workload used 7 threads and the padata workload used 8 threads to evaluate how much padata helpers, rather than the main padata thread, disturbed the non-padata workload. - The non-padata workload was started after the padata workload and run for less time to maximize the chances that the non-padata workload would be disturbed. - Runtimes in seconds. Case 1: Synthetic, worst-case CPU contention padata_test - a tight loop doing integer multiplication to max out on CPU; used for testing only, does not appear in this series stress-ng - cpu stressor ("-c --cpu-method ackerman --cpu-ops 1200"); stress-ng alone (stdev) max_nice (stdev) normal_prio (stdev) ------------------------------------------------------------ padata_test 96.87 ( 1.09) 90.81 ( 0.29) stress-ng 43.04 ( 0.00) 43.58 ( 0.01) 75.86 ( 0.39) MAX_NICE helpers make a significant difference compared to normal priority helpers, with stress-ng taking 76% longer to finish when competing with normal priority padata threads than when run by itself, but only 1% longer when run with MAX_NICE helpers. The 1% comes from the small amount of CPU time MAX_NICE threads are given despite their low priority. Case 2: Real-world CPU contention padata_vfio - VFIO page pin a 175G kvm guest usemem - faults in 25G of anonymous THP per thread, PAGE_SIZE stride; used to mimic the page clearing that dominates in padata_vfio so that usemem competes for the same system resources usemem alone (stdev) max_nice (stdev) normal_prio (stdev) ------------------------------------------------------------ padata_vfio 14.74 ( 0.04) 9.93 ( 0.09) usemem 10.45 ( 0.04) 10.75 ( 0.04) 14.14 ( 0.07) Here the effect is similar, just not as pronounced. The usemem threads take 35% longer to finish with normal priority padata threads than when run alone, but only 3% longer when MAX_NICE is used. [1] lkml.kernel.org/r/20171206143509.GG7515@xxxxxxxxxxxxxx Signed-off-by: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> --- kernel/padata.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/padata.c b/kernel/padata.c index ef6589a6b665..83e86724b3e1 100644 --- a/kernel/padata.c +++ b/kernel/padata.c @@ -638,7 +638,10 @@ int padata_do_multithreaded_job(struct padata_mt_job *job, if (IS_ERR(task)) { --ps.nworks; } else { + /* Helper threads shouldn't disturb other workloads. */ + set_user_nice(task, MAX_NICE); kthread_bind_mask(task, current->cpus_ptr); + wake_up_process(task); } } -- 2.34.1