Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> writes: [snip] > > Case 1: Comparing zswap 4K vs. zswap mTHP > ========================================= > > In this scenario, the "before" is CONFIG_THP_SWAP set to off, that results in > 64K/2M (m)THP to be split into 4K folios that get processed by zswap. > > The "after" is CONFIG_THP_SWAP set to on, and this patch-series, that results > in 64K/2M (m)THP to not be split, and processed by zswap. > > 64KB mTHP (cgroup memory.high set to 40G): > ========================================== > > ------------------------------------------------------------------------------- > mm-unstable 9-23-2024 zswap-mTHP Change wrt > CONFIG_THP_SWAP=N CONFIG_THP_SWAP=Y Baseline > Baseline > ------------------------------------------------------------------------------- > ZSWAP compressor zstd deflate- zstd deflate- zstd deflate- > iaa iaa iaa > ------------------------------------------------------------------------------- > Throughput (KB/s) 143,323 125,485 153,550 129,609 7% 3% > elapsed time (sec) 24.97 25.42 23.90 25.19 4% 1% > sys time (sec) 822.72 750.96 757.70 731.13 8% 3% > memcg_high 132,743 169,825 148,075 192,744 > memcg_swap_fail 639,067 841,553 2,204 2,215 > pswpin 0 0 0 0 > pswpout 0 0 0 0 > zswpin 795 873 760 902 > zswpout 10,011,266 13,195,137 10,010,017 13,193,554 > thp_swpout 0 0 0 0 > thp_swpout_ 0 0 0 0 > fallback > 64kB-mthp_ 639,065 841,553 2,204 2,215 > swpout_fallback > pgmajfault 2,861 2,924 3,054 3,259 > ZSWPOUT-64kB n/a n/a 623,451 822,268 > SWPOUT-64kB 0 0 0 0 > ------------------------------------------------------------------------------- > IIUC, the throughput is the sum of throughput of all usemem processes? One possible issue of usemem test case is the "imbalance" issue. That is, some usemem processes may swap-out/swap-in less, so the score is very high; while some other processes may swap-out/swap-in more, so the score is very low. Sometimes, the total score decreases, but the scores of usemem processes are more balanced, so that the performance should be considered better. And, in general, we should make usemem score balanced among processes via say longer test time. Can you check this in your test results? [snip] -- Best Regards, Huang, Ying