Re: sysbench throughput degradation in 4.13+

Eric Farman <farman@xxxxxxxxxxxxxxxxxx> · Fri, 22 Sep 2017 11:03:39 -0400

On 09/13/2017 04:24 AM, 王金浦 wrote:
2017-09-12 16:14 GMT+02:00 Eric Farman <farman@xxxxxxxxxxxxxxxxxx>:
Hi Peter, Rik,

Running sysbench measurements in a 16CPU/30GB KVM guest on a 20CPU/40GB
s390x host, we noticed a throughput degradation (anywhere between 13% and
40%, depending on test) when moving the host from kernel 4.12 to 4.13.  The
rest of the host and the entire guest remain unchanged; it is only the host
kernel that changes.  Bisecting the host kernel blames commit 3fed382b46ba
("sched/numa: Implement NUMA node level wake_affine()").

Reverting 3fed382b46ba and 815abf5af45f ("sched/fair: Remove
effective_load()") from a clean 4.13.0 build erases the throughput
degradation and returns us to what we see in 4.12.0.

A little poking around points us to a fix/improvement to this, commit
90001d67be2f ("sched/fair: Fix wake_affine() for !NUMA_BALANCING"), which
went in the 4.14 merge window and an unmerged fix [1] that corrects a small
error in that patch.  Hopeful, since we were running !NUMA_BALANCING, I
applied these two patches to a clean 4.13.0 tree but continue to see the
performance degradation.  Pulling current master or linux-next shows no
improvement lurking in the shadows.

Running perf stat on the host during the guest sysbench run shows a
significant increase in cpu-migrations over the 4.12.0 run.  Abbreviated
examples follow:

# 4.12.0
# perf stat -p 11473 -- sleep 5
       62305.199305      task-clock (msec)         #   12.458 CPUs
            368,607      context-switches
              4,084      cpu-migrations
                416      page-faults

# 4.13.0
# perf stat -p 11444 -- sleep 5
       35892.653243      task-clock (msec)         #    7.176 CPUs
            249,251      context-switches
             56,850      cpu-migrations
                804      page-faults

# 4.13.0-revert-3fed382b46ba-and-815abf5af45f
# perf stat -p 11441 -- sleep 5
       62321.767146      task-clock (msec)         #   12.459 CPUs
            387,661      context-switches
              5,687      cpu-migrations
              1,652      page-faults

# 4.13.0-apply-90001d67be2f
# perf stat -p 11438 -- sleep 5
       48654.988291      task-clock (msec)         #    9.729 CPUs
            363,150      context-switches
             43,778      cpu-migrations
                641      page-faults

I'm not sure what doc to supply here and am unfamiliar with this code or its
recent changes, but I'd be happy to pull/try whatever is needed to help
debug things.  Looking forward to hearing what I can do.

Thanks,
Eric

[1] https://lkml.org/lkml/2017/9/6/196

+cc: vcaputo@xxxxxxxxxxx
He reported a performance degradation also on 4.13-rc7, it might be
the same cause.

Best,
Jack

Hi Peter, Rik,

With OSS last week, I'm sure this got lost in the deluge, so here's a 
friendly ping.  I picked up 4.14.0-rc1 earlier this week, and still see 
the degradation described above.  Not really a surprise, since I don't 
see any other commits in this area beyond the ones I mentioned in my 
original note.

Anyway, I'm unsure what else to try or what doc to pull to help debug 
this, and would appreciate your expertise here.  We can repro this 
pretty easily as necessary to help get to the bottom of this.

Many thanks in advance,

 - Eric

(also, +cc Matt to help when I'm out of office myself.)