Hi Peter, Rik,
Running sysbench measurements in a 16CPU/30GB KVM guest on a 20CPU/40GB
s390x host, we noticed a throughput degradation (anywhere between 13%
and 40%, depending on test) when moving the host from kernel 4.12 to
4.13. The rest of the host and the entire guest remain unchanged; it is
only the host kernel that changes. Bisecting the host kernel blames
commit 3fed382b46ba ("sched/numa: Implement NUMA node level wake_affine()").
Reverting 3fed382b46ba and 815abf5af45f ("sched/fair: Remove
effective_load()") from a clean 4.13.0 build erases the throughput
degradation and returns us to what we see in 4.12.0.
A little poking around points us to a fix/improvement to this, commit
90001d67be2f ("sched/fair: Fix wake_affine() for !NUMA_BALANCING"),
which went in the 4.14 merge window and an unmerged fix [1] that
corrects a small error in that patch. Hopeful, since we were running
!NUMA_BALANCING, I applied these two patches to a clean 4.13.0 tree but
continue to see the performance degradation. Pulling current master or
linux-next shows no improvement lurking in the shadows.
Running perf stat on the host during the guest sysbench run shows a
significant increase in cpu-migrations over the 4.12.0 run. Abbreviated
examples follow:
# 4.12.0
# perf stat -p 11473 -- sleep 5
62305.199305 task-clock (msec) # 12.458 CPUs
368,607 context-switches
4,084 cpu-migrations
416 page-faults
# 4.13.0
# perf stat -p 11444 -- sleep 5
35892.653243 task-clock (msec) # 7.176 CPUs
249,251 context-switches
56,850 cpu-migrations
804 page-faults
# 4.13.0-revert-3fed382b46ba-and-815abf5af45f
# perf stat -p 11441 -- sleep 5
62321.767146 task-clock (msec) # 12.459 CPUs
387,661 context-switches
5,687 cpu-migrations
1,652 page-faults
# 4.13.0-apply-90001d67be2f
# perf stat -p 11438 -- sleep 5
48654.988291 task-clock (msec) # 9.729 CPUs
363,150 context-switches
43,778 cpu-migrations
641 page-faults
I'm not sure what doc to supply here and am unfamiliar with this code or
its recent changes, but I'd be happy to pull/try whatever is needed to
help debug things. Looking forward to hearing what I can do.
Thanks,
Eric
[1] https://lkml.org/lkml/2017/9/6/196