Hi Ying, > -----Original Message----- > From: Huang, Ying <ying.huang@xxxxxxxxx> > Sent: Friday, August 16, 2024 2:03 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; nphamcs@xxxxxxxxx; > ryan.roberts@xxxxxxx; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > Subject: Re: [PATCH v2 0/4] mm: ZSWAP swap-out of mTHP folios > > Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> writes: > > > Hi All, > > > > This patch-series enables zswap_store() to accept and store mTHP > > folios. The most significant contribution in this series is from the > > earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been > > migrated to v6.11-rc3 in patch 2/4 of this series. > > > > [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting > > https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > ryan.roberts@xxxxxxx/T/#u > > > > Additionally, there is an attempt to modularize some of the functionality > > in zswap_store(), to make it more amenable to supporting any-order > > mTHPs. > > > > For instance, the determination of whether a folio is same-filled is > > based on mapping an index into the folio to derive the page. Likewise, > > there is a function "zswap_store_entry" added to store a zswap_entry in > > the xarray. > > > > For accounting purposes, the patch-series adds per-order mTHP sysfs > > "zswpout" counters that get incremented upon successful zswap_store of > > an mTHP folio: > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout > > > > This patch-series is a precursor to ZSWAP compress batching of mTHP > > swap-out and decompress batching of swap-ins based on > swapin_readahead(), > > using Intel IAA hardware acceleration, which we would like to submit in > > subsequent RFC patch-series, with performance improvement data. > > > > Thanks to Ying Huang for pre-posting review feedback and suggestions! > > > > Changes since RFC v1: > > ===================== > > > > 1) Use sysfs for zswpout mTHP stats, as per Barry Song's suggestion. > > Thanks Barry! > > 2) Addressed some of the code review comments that Nhat Pham provided > in > > Ryan's initial RFC [1]: > > - Added a comment about the cgroup zswap limit checks occuring once > per > > folio at the beginning of zswap_store(). > > Nhat, Ryan, please do let me know if the comments convey the summary > > from the RFC discussion. Thanks! > > - Posted data on running the cgroup suite's zswap kselftest. > > 3) Rebased to v6.11-rc3. > > 4) Gathered performance data with usemem and the rebased patch-series. > > > > Performance Testing: > > ==================== > > Testing of this patch-series was done with the v6.11-rc3 mainline, without > > and with this patch-series, on an Intel Sapphire Rapids server, > > dual-socket 56 cores per socket, 4 IAA devices per socket. > > > > The system has 503 GiB RAM, 176 GiB swap/ZSWAP with ZRAM as the > backing > > swap device. Core frequency was fixed at 2500MHz. > > I don't think that this is a reasonable test configuration, there's no > benefit to use ZSWAP+ZRAM. We should use a normal SSD as backing swap > device. Thanks for this suggestion. Sure, I will gather data using SSD instead of ZRAM as the backing swap device. > > > The vm-scalability "usemem" test was run in a cgroup whose memory.high > > was fixed at 40G. Following a similar methodology as in Ryan Roberts' > > "Swap-out mTHP without splitting" series [2], 70 usemem processes were > > run, each allocating and writing 1G of memory: > > > > usemem --init-time -w -O -n 70 1g > > > > Other kernel configuration parameters: > > > > ZSWAP Compressor : LZ4, DEFLATE-IAA > > ZSWAP Allocator : ZSMALLOC > > ZRAM Compressor : LZO-RLE > > SWAP page-cluster : 2 > > > > In the experiments where "deflate-iaa" is used as the ZSWAP compressor, > > IAA "compression verification" is enabled. Hence each IAA compression > > will be decompressed internally by the "iaa_crypto" driver, the crc-s > > returned by the hardware will be compared and errors reported in case of > > mismatches. Thus "deflate-iaa" helps ensure better data integrity as > > compared to the software compressors. > > > > Throughput reported by usemem and perf sys time for running the test > > are as follows: > > > > 64KB mTHP: > > ========== > > ------------------------------------------------------------------ > > | | | | | > > |Kernel | mTHP SWAP-OUT | Throughput | Improvement| > > | | | KB/s | | > > |--------------------|-------------------|------------|------------| > > |v6.11-rc3 mainline | ZRAM lzo-rle | 118,928 | Baseline | > > |zswap-mTHP-Store | ZSWAP lz4 | 82,665 | -30% | > > Because the test configuration isn't reasonable, the performance drop > isn't reasonable too. We should compare between zswap+SSD w/o mTHP > zswap and zswap+SSD w/ mTHP zswap. I think that there should be > performance improvement for that. Sure, I will gather and post the data with these two configurations. Thanks, Kanchana > > > |zswap-mTHP-Store | ZSWAP deflate-iaa | 176,210 | 48% | > > |------------------------------------------------------------------| > > | | | | | > > |Kernel | mTHP SWAP-OUT | Sys time | Improvement| > > | | | sec | | > > |--------------------|-------------------|------------|------------| > > |v6.11-rc3 mainline | ZRAM lzo-rle | 1,032.20 | Baseline | > > |zswap-mTHP=Store | ZSWAP lz4 | 1,854.51 | -80% | > > |zswap-mTHP-Store | ZSWAP deflate-iaa | 582.71 | 44% | > > ------------------------------------------------------------------ > > > > ----------------------------------------------------------------------- > > | VMSTATS, mTHP ZSWAP stats, | v6.11-rc3 | zswap-mTHP | zswap- > mTHP | > > | mTHP ZRAM stats: | mainline | Store | Store | > > | | | lz4 | deflate-iaa | > > |-----------------------------------------------------------------------| > > | pswpin | 16 | 0 | 0 | > > | pswpout | 7,770,720 | 0 | 0 | > > | zswpin | 547 | 695 | 579 | > > | zswpout | 1,394 | 15,462,778 | 7,284,554 | > > |-----------------------------------------------------------------------| > > | thp_swpout | 0 | 0 | 0 | > > | thp_swpout_fallback | 0 | 0 | 0 | > > | pgmajfault | 3,786 | 3,541 | 3,367 | > > |-----------------------------------------------------------------------| > > | hugepages-64kB/stats/zswpout | | 966,328 | 455,196 | > > |-----------------------------------------------------------------------| > > | hugepages-64kB/stats/swpout | 485,670 | 0 | 0 | > > ----------------------------------------------------------------------- > > > > > > 2MB PMD-THP/2048K mTHP: > > ======================= > > ------------------------------------------------------------------ > > | | | | | > > |Kernel | mTHP SWAP-OUT | Throughput | Improvement| > > | | | KB/s | | > > |--------------------|-------------------|------------|------------| > > |v6.11-rc3 mainline | ZRAM lzo-rle | 177,340 | Baseline | > > |zswap-mTHP-Store | ZSWAP lz4 | 84,030 | -53% | > > |zswap-mTHP-Store | ZSWAP deflate-iaa | 185,691 | 5% | > > |------------------------------------------------------------------| > > | | | | | > > |Kernel | mTHP SWAP-OUT | Sys time | Improvement| > > | | | sec | | > > |--------------------|-------------------|------------|------------| > > |v6.11-rc3 mainline | ZRAM lzo-rle | 876.29 | Baseline | > > |zswap-mTHP-Store | ZSWAP lz4 | 1,740.55 | -99% | > > |zswap-mTHP-Store | ZSWAP deflate-iaa | 650.33 | 26% | > > ------------------------------------------------------------------ > > > > ------------------------------------------------------------------------- > > | VMSTATS, mTHP ZSWAP stats, | v6.11-rc3 | zswap-mTHP | zswap- > mTHP | > > | mTHP ZRAM stats: | mainline | Store | Store | > > | | | lz4 | deflate-iaa | > > |-------------------------------------------------------------------------| > > | pswpin | 0 | 0 | 0 | > > | pswpout | 8,628,224 | 0 | 0 | > > | zswpin | 678 | 22,733 | 1,641 | > > | zswpout | 1,481 | 14,828,597 | 9,404,937 | > > |-------------------------------------------------------------------------| > > | thp_swpout | 16,852 | 0 | 0 | > > | thp_swpout_fallback | 0 | 0 | 0 | > > | pgmajfault | 3,467 | 25,550 | 4,800 | > > |-------------------------------------------------------------------------| > > | hugepages-2048kB/stats/zswpout | | 28,924 | 18,366 | > > |-------------------------------------------------------------------------| > > | hugepages-2048kB/stats/swpout | 16,852 | 0 | 0 | > > ------------------------------------------------------------------------- > > > > As expected, in the "Before" experiment, there are relatively fewer > > swapouts because ZRAM utilization is not accounted in the cgroup. > > > > With the introduction of zswap_store mTHP, the "After" data reflects the > > higher swapout activity, and consequent throughput/sys time degradation > > when LZ4 is used as the zswap compressor. However, we observe > considerable > > throughput and sys time improvement in the "After" data when DEFLATE- > IAA > > is the zswap compressor. This observation holds for 64K mTHP and 2MB THP > > experiments. IAA's higher compression ratio and better compress latency > > can be attributed to fewer swap-outs and major page-faults, that result > > in better throughput and sys time. > > > > Our goal is to improve ZSWAP mTHP store performance using batching. With > > Intel IAA compress/decompress batching used in ZSWAP (to be submitted as > > additional RFC series), we are able to demonstrate significant > > performance improvements and memory savings with IAA as compared to > > software compressors. > > > > cgroup zswap kselftest: > > ======================= > > > > "Before": > > ========= > > Test run with v6.11-rc3 and no code changes: > > mTHP 64K set to 'always' > > zswap compressor set to 'lz4' > > page-cluster = 3 > > > > zswap shrinker_enabled = N: > > --------------------------- > > ok 1 test_zswap_usage > > ok 2 test_swapin_nozswap > > # at least 24MB should be brought back from zswap > > not ok 3 test_zswapin > > # zswpwb_after is 0 while wb is enablednot ok 4 > test_zswap_writeback_enabled > > # Failed to reclaim all of the requested memory > > not ok 5 test_zswap_writeback_disabled > > ok 6 # SKIP test_no_kmem_bypass > > ok 7 test_no_invasive_cgroup_shrink > > > > zswap shrinker_enabled = Y: > > --------------------------- > > ok 1 test_zswap_usage > > ok 2 test_swapin_nozswap > > # at least 24MB should be brought back from zswap > > not ok 3 test_zswapin > > # zswpwb_after is 0 while wb is enablednot ok 4 > test_zswap_writeback_enabled > > # Failed to reclaim all of the requested memory > > not ok 5 test_zswap_writeback_disabled > > ok 6 # SKIP test_no_kmem_bypass > > not ok 7 test_no_invasive_cgroup_shrink > > > > "After": > > ======== > > Test run with this patch-series and v6.11-rc3: > > mTHP 64K set to 'always' > > zswap compressor set to 'deflate-iaa' > > page-cluster = 3 > > > > zswap shrinker_enabled = N: > > --------------------------- > > ok 1 test_zswap_usage > > ok 2 test_swapin_nozswap > > ok 3 test_zswapin > > ok 4 test_zswap_writeback_enabled > > ok 5 test_zswap_writeback_disabled > > ok 6 # SKIP test_no_kmem_bypass > > ok 7 test_no_invasive_cgroup_shrink > > > > zswap shrinker_enabled = Y: > > --------------------------- > > ok 1 test_zswap_usage > > ok 2 test_swapin_nozswap > > # at least 24MB should be brought back from zswap > > not ok 3 test_zswapin > > ok 4 test_zswap_writeback_enabled > > ok 5 test_zswap_writeback_disabled > > ok 6 # SKIP test_no_kmem_bypass > > not ok 7 test_no_invasive_cgroup_shrink > > > > I haven't taken an in-depth look into the cgroup zswap tests, but it > > looks like the results with the patch-series are no worse than without, > > and in some cases better (not exactly sure why, this needs more > > analysis). > > > > I would greatly appreciate your code review comments and suggestions! > > > > Thanks, > > Kanchana > > > > [2] https://lore.kernel.org/linux-mm/20240408183946.2991168-1- > ryan.roberts@xxxxxxx/ > > > > > > Kanchana P Sridhar (4): > > mm: zswap: zswap_is_folio_same_filled() takes an index in the folio. > > mm: zswap: zswap_store() extended to handle mTHP folios. > > mm: Add MTHP_STAT_ZSWPOUT to sysfs per-order mthp stats. > > mm: swap: Count successful mTHP ZSWAP stores in sysfs mTHP stats. > > > > include/linux/huge_mm.h | 1 + > > mm/huge_memory.c | 2 + > > mm/page_io.c | 7 ++ > > mm/zswap.c | 238 +++++++++++++++++++++++++++++----------- > > 4 files changed, 184 insertions(+), 64 deletions(-) > > -- > Best Regards, > Huang, Ying