[Linux Kernel 5.14 GA] ESXi Performance degradation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



As part of VMware's performance regression testing for Linux
Kernel upstream releases, we have evaluated the performance 
of Linux kernel 5.14 against the 5.13 release and would like 
to share the below observation. We have noticed performance 
degradation in ESXi Networking workloads up to 25% and ESXi 
Storage workloads up to 5%. From ESXi Networking perspective,
we were able to notice performance degradation in Netperf 
“TCP_STREAM_RECV large packets” Throughput tests up to 25%. 
In storage, we were able to notice performance degradation 
only in CPU cost metric up to 5%. 

After performing the bisect between kernel 5.14 and 5.13, we 
identified the root cause behavior to be a "memory allocation" 
of Mel's commit "44042b4498728f4376e84bae1ac8016d146d850b 
mm/page_alloc: allow high-order pages to be stored on the 
per-cpu lists").

To confirm this, we have backed out the above mentioned commit
from 5.14 & re-ran our tests and found that the performance was
on-par to 5.13 kernel. 

Immediate before commit: 43b02ba93b25b1caff7a3457fc5d005485e78da5
Mel's commit: 44042b4498728f4376e84bae1ac8016d146d850b
Mel’s commit git URL:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/
linux.git/commit/?id=44042b4498728f4376e84bae1ac8016d146d850b

To analyse the performance degradation further, we have collected
perf stats between Mel's commit & immediate before commit while 
running the Netperf benchmark and observed high cache-misses in
Mel's commit when compared to immediate before commit. Please 
find the perf-stats data when running netperf TCP_STREAM tests. 

Performance counter stats for 'system wide':
Immediate before commit:
cache-references - 5,343,078,363
cache-misses - 26,632,656 (0.498 % of all cache refs)

Mel's commit:
cache-references - 4,930,300,091
cache-misses - 319,495,743 (6.480 % of all cache refs)

We have synced-up with Mel offline and performed different 
experiments requested by him. He identified the root cause 
of the perf degradation and provided us a patch to validate. 
We have validated his patch and confirmed that it fixes our 
perf degradation and the perf #s are also on-par with kernel 5.13.

Performance data: 
TCP_STREAM_RECV Throughput: 
Immediate before commit: 16.394 Gbps 
Mel's commit: 15.465 Gbps 
Mel's patch: 16.461 Gbps

Patch URL: https://lore.kernel.org/all/
20220217002227.5739-1-mgorman@xxxxxxxxxxxxxxxxxxx/

Since we have received a fix from Mel for the reported degradation
through offline, we wanted to document this in this community for 
reference.

Since we observe some performance degradation due to this commit
(44042b4498728f4376e84bae1ac8016d146d850b), could you please 
backport this patch/fix to kernel 5.14 release.

Manikandan Jagatheesan
Performance Engineering
VMware, Inc.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux