"Sridhar, Kanchana P" <kanchana.p.sridhar@xxxxxxxxx> writes: [snip] > > Thanks, these are good points. I ran this experiment with mm-unstable 9-17-2024, > commit 248ba8004e76eb335d7e6079724c3ee89a011389. > > Data is based on average of 3 runs of the vm-scalability "usemem" test. > > 4G SSD backing zswap, each process sleeps before exiting > ======================================================== > > 64KB mTHP (cgroup memory.high set to 60G, no swap limit): > ========================================================= > CONFIG_THP_SWAP=Y > Sapphire Rapids server with 503 GiB RAM and 4G SSD swap backing device > for zswap. > > Experiment 1: Each process sleeps for 0 sec after allocating memory > (usemem --init-time -w -O --sleep 0 -n 70 1g): > > ------------------------------------------------------------------------------- > mm-unstable 9-17-2024 zswap-mTHP v6 Change wrt > Baseline Baseline > "before" "after" (sleep 0) > ------------------------------------------------------------------------------- > ZSWAP compressor zstd deflate- zstd deflate- zstd deflate- > iaa iaa iaa > ------------------------------------------------------------------------------- > Throughput (KB/s) 296,684 274,207 359,722 390,162 21% 42% > sys time (sec) 92.67 93.33 251.06 237.56 -171% -155% > memcg_high 3,503 3,769 44,425 27,154 > memcg_swap_fail 0 0 115,814 141,936 > pswpin 17 0 0 0 > pswpout 370,853 393,232 0 0 > zswpin 693 123 666 667 > zswpout 1,484 123 1,366,680 1,199,645 > thp_swpout 0 0 0 0 > thp_swpout_ 0 0 0 0 > fallback > pgmajfault 3,384 2,951 3,656 3,468 > ZSWPOUT-64kB n/a n/a 82,940 73,121 > SWPOUT-64kB 23,178 24,577 0 0 > ------------------------------------------------------------------------------- > > > Experiment 2: Each process sleeps for 10 sec after allocating memory > (usemem --init-time -w -O --sleep 10 -n 70 1g): > > ------------------------------------------------------------------------------- > mm-unstable 9-17-2024 zswap-mTHP v6 Change wrt > Baseline Baseline > "before" "after" (sleep 10) > ------------------------------------------------------------------------------- > ZSWAP compressor zstd deflate- zstd deflate- zstd deflate- > iaa iaa iaa > ------------------------------------------------------------------------------- > Throughput (KB/s) 86,744 93,730 157,528 113,110 82% 21% > sys time (sec) 308.87 315.29 477.55 629.98 -55% -100% What is the elapsed time for all cases? > memcg_high 169,450 188,700 143,691 177,887 > memcg_swap_fail 10,131,859 9,740,646 18,738,715 19,528,110 > pswpin 17 16 0 0 > pswpout 1,154,779 1,210,485 0 0 > zswpin 711 659 1,016 736 > zswpout 70,212 50,128 1,235,560 1,275,917 > thp_swpout 0 0 0 0 > thp_swpout_ 0 0 0 0 > fallback > pgmajfault 6,120 6,291 8,789 6,474 > ZSWPOUT-64kB n/a n/a 67,587 68,912 > SWPOUT-64kB 72,174 75,655 0 0 > ------------------------------------------------------------------------------- > > > Conclusions from the experiments: > ================================= > 1) zswap-mTHP improves throughput as compared to the baseline, for zstd and > deflate-iaa. > > 2) Yosry's theory is proved correct in the 4G constrained swap setup. > When the processes are constrained to sleep 10 sec after allocating > memory, thereby keeping the memory allocated longer, the "Baseline" or > "before" with mTHP getting stored in SSD shows a degradation of 71% in > throughput and 238% in sys time, as compared to the "Baseline" with Higher sys time may come from compression with CPU vs. disk writing? > sleep 0 that benefits from serialization of disk IO not allowing all > processes to allocate memory at the same time. > > 3) In the 4G SSD "sleep 0" case, zswap-mTHP shows an increase in sys time > due to the cgroup charging and consequently higher memcg.high breaches > and swapout activity. > > However, the "sleep 10" case's sys time seems to degrade less, and the > memcg.high breaches and swapout activity are almost similar between the > before/after (confirming Yosry's hypothesis). Further, the > memcg_swap_fail activity in the "after" scenario is almost 2X that of > the "before". This indicates failure to obtain swap offsets, resulting > in the folio remaining active in memory. > > I tried to better understand this through the 64k mTHP swpout_fallback > stats in the "sleep 10" zstd experiments: > > -------------------------------------------------------------- > "before" "after" > -------------------------------------------------------------- > 64k mTHP swpout_fallback 627,308 897,407 > 64k folio swapouts 72,174 67,587 > [p|z]swpout events due to 64k mTHP 1,154,779 1,081,397 > 4k folio swapouts 70,212 154,163 > -------------------------------------------------------------- > > The data indicates a higher # of 64k folio swpout_fallback with > zswap-mTHP, that co-relates with the higher memcg_swap_fail counts and > 4k folio swapouts with zswap-mTHP. Could the root-cause be fragmentation > of the swap space due to zswap swapout being faster than SSD swapout? > [snip] -- Best Regards, Huang, Ying