This is a second round of performance-related backports based on low-hanging fruit in the 4.13 merge window based on 4.12.2. As before, these have only been tested on 4.12-stable. While they may merge against older kernels, I have no data on how it behaves and cannot guarantee it's a good idea so I don't recommend it. There will also be some major conflicts that are not trivial to resolve. For most of the tests I conducted, the impact is marginal but patches the first two sets of patches are important for large machines and for uses of nohz_full. The load balancing patch is fairly specific but measurable. The removal of unnecessary IRQ disabling/enabling is borderline in terms of performance but they are trivial patches and avoiding unnecessary expensive operations is always a plus. Patches 1-17 resolve a number of topology problems in the scheduler that primarily impact NUMA machines with a ring topology. There are more patches in there than necessary but one adds very helpful comments on understanding how it works and a few bring the naming of functions in line with 4.13 which makes it a bit easier to follow. Others shuffle comments around and restructure the code which could have been avoided but then the backported patches would not look like their upstream equivalent. While some of the extra patches are outside the scope of -stable, it removes the delta when comparing the 4.12-stable and 4.13 scheduler but I can drop them if necessary. Performance impact on UMA and fully-connected machines is marginal with minor gains/losses across multiple machines that is mostly within the noise but other reports indicate that the impact on ring topologies is substantial. In particular, the full machine will be properly utilised instead of saturating a subset of nodes for workloads with lots of threads or processes. Patches 18-22 are more about accounting than performance. The bug is with workloads running on nohz_full+isolcpus configurations. If 2 or more processes are running on an isolated CPU are 100% userspace bound and normal processes are running on other CPUs then the isolated processes report a mix of userspace and system CPU usage. It can be up to 100% system CPU usage even though in reality there is no time being spent in the kernel. This misaccounting is confusing when analysing workloads. For normal workloads, there is no measurable difference. Patch 23 fixes a scheduler load balancing issue where an imbalanced domain is considered balanced when some tasks are pinned for affinity. Again, for many workloads the impact is marginal but it was a small boost (1-2% barely outside noise) for a specjbb configuration that pinned JVMs. It may be co-incidence but the patch is straight-forward. Patches 24-25 avoid unnecessary IRQ disable/enable while updating writeback stats. In many cases this will not be noticable because it happens out-of-band and the cost of stats updates are often negligible compared to the overall cost of writeback. However, unnecessary IRQ disabling is never a good thing and it may be noticable during writeback to ultra-fast storage. Patches 26 avoids an IRQ disable/enable in the fork path. It's noticable on fork-intensive workloads with a 1-3% boost on hackbench for example that is just outside the noise. -- 2.13.1