Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> writes: [snip] > > Performance Testing: > ==================== > Testing of this patch-series was done with the v6.11-rc3 mainline, without > and with this patch-series, on an Intel Sapphire Rapids server, > dual-socket 56 cores per socket, 4 IAA devices per socket. > > The system has 503 GiB RAM, with a 4G SSD as the backing swap device for > ZSWAP. Core frequency was fixed at 2500MHz. > > The vm-scalability "usemem" test was run in a cgroup whose memory.high > was fixed. Following a similar methodology as in Ryan Roberts' > "Swap-out mTHP without splitting" series [2], 70 usemem processes were > run, each allocating and writing 1G of memory: > > usemem --init-time -w -O -n 70 1g > > Since I was constrained to get the 70 usemem processes to generate > swapout activity with the 4G SSD, I ended up using different cgroup > memory.high fixed limits for the experiments with 64K mTHP and 2M THP: > > 64K mTHP experiments: cgroup memory fixed at 60G > 2M THP experiments : cgroup memory fixed at 55G > > The vm/sysfs stats included after the performance data provide details > on the swapout activity to SSD/ZSWAP. > > Other kernel configuration parameters: > > ZSWAP Compressor : LZ4, DEFLATE-IAA > ZSWAP Allocator : ZSMALLOC > SWAP page-cluster : 2 > > In the experiments where "deflate-iaa" is used as the ZSWAP compressor, > IAA "compression verification" is enabled. Hence each IAA compression > will be decompressed internally by the "iaa_crypto" driver, the crc-s > returned by the hardware will be compared and errors reported in case of > mismatches. Thus "deflate-iaa" helps ensure better data integrity as > compared to the software compressors. > > Throughput reported by usemem and perf sys time for running the test > are as follows, averaged across 3 runs: > > 64KB mTHP (cgroup memory.high set to 60G): > ========================================== > ------------------------------------------------------------------ > | | | | | > |Kernel | mTHP SWAP-OUT | Throughput | Improvement| > | | | KB/s | | > |--------------------|-------------------|------------|------------| > |v6.11-rc3 mainline | SSD | 335,346 | Baseline | > |zswap-mTHP-Store | ZSWAP lz4 | 271,558 | -19% | zswap throughput is worse than ssd swap? This doesn't look right. > |zswap-mTHP-Store | ZSWAP deflate-iaa | 388,154 | 16% | > |------------------------------------------------------------------| > | | | | | > |Kernel | mTHP SWAP-OUT | Sys time | Improvement| > | | | sec | | > |--------------------|-------------------|------------|------------| > |v6.11-rc3 mainline | SSD | 91.37 | Baseline | > |zswap-mTHP=Store | ZSWAP lz4 | 265.43 | -191% | > |zswap-mTHP-Store | ZSWAP deflate-iaa | 235.60 | -158% | > ------------------------------------------------------------------ > > ----------------------------------------------------------------------- > | VMSTATS, mTHP ZSWAP/SSD stats| v6.11-rc3 | zswap-mTHP | zswap-mTHP | > | | mainline | Store | Store | > | | | lz4 | deflate-iaa | > |-----------------------------------------------------------------------| > | pswpin | 0 | 0 | 0 | > | pswpout | 174,432 | 0 | 0 | > | zswpin | 703 | 534 | 721 | > | zswpout | 1,501 | 1,491,654 | 1,398,805 | It appears that the number of swapped pages for zswap is much larger than that of SSD swap. Why? I guess this is why zswap throughput is worse. > |-----------------------------------------------------------------------| > | thp_swpout | 0 | 0 | 0 | > | thp_swpout_fallback | 0 | 0 | 0 | > | pgmajfault | 3,364 | 3,650 | 3,431 | > |-----------------------------------------------------------------------| > | hugepages-64kB/stats/zswpout | | 63,200 | 63,244 | > |-----------------------------------------------------------------------| > | hugepages-64kB/stats/swpout | 10,902 | 0 | 0 | > ----------------------------------------------------------------------- > [snip] -- Best Regards, Huang, Ying