Hi Herbert, On Fri, 2024-04-05 at 15:07 +0800, Herbert Xu wrote: > On Thu, Mar 28, 2024 at 10:44:41AM -0700, Andre Glover wrote: > > > > Below is a table showing the latency improvements with zlib, > > between > > zlib dynamic and zlib canned modes, and the compression ratio for > > each mode while using a set of 4300 4KB pages sampled from SPEC > > CPU17 workloads: > > _________________________________________________________ > > > Zlib Level | Canned Latency Gain | Comp Ratio | > > > ------------|-----------------------|------------------| > > > | compress | decompress | dynamic | canned | > > > ____________|__________|____________|_________|________| > > > 1 | 49% | 29% | 3.16 | 2.92 | > > > ------------|----------|------------|---------|--------| > > > 6 | 27% | 28% | 3.35 | 3.09 | > > > ------------|----------|------------|---------|--------| > > > 9 | 12% | 29% | 3.36 | 3.11 | > > > ____________|__________|____________|_________|________| > > So which kernel user (zswap I presume) is clamouring for this > feature? We don't add new algorithms that have no in-kernel > users. So we need to be sure that the kernel user actually > want this. > > Thanks, Hi Herbert, We have recently submitted an RFC to zswap and zram maintainers and users for by_n compression with Intel IAA [1] feedback. This work is in support of efforts to swap in/out large and multi-sized folios. With by_n compression, we have created a scheme that allows parallel IAA compression and decompression operations on a single folio resulting in performance gains. Currently the by_n scheme uses the canned mode compression algorithm to perform the compression and decompression operations. Using canned mode compression results in reduced compression latency because the deflate header doesnt need to be created dynamically, while also producing better ratio than Deflate Fixed mode. We would appreciate your feedback on this scheme. Here is data from the RFC showing a performance comparison for 64KB folio swap in/out with zram on Sapphire Rapids, whose core frequency is fixed at 2500MHz: +------------+-------------+---------+-------------+----------+-------+ | | Compression | Decomp | Compression | zram | zram | | Algorithm | latency | latency | ratio | write | read | +------------+-------------+---------+-------------+----------+-------+ | | Median (ns) | | Median (ns) | +------------+-------------+---------+-------------+----------+-------+ | | | | | | | | IAA by_1 | 34,493 | 20,038 | 2.93 | 40,130 | 24,478| | IAA by_2 | 18,830 | 11,888 | 2.93 | 24,149 | 15,536| | IAA by_4 | 11,364 | 8,146 | 2.90 | 16,735 | 11,469| | IAA by_8 | 8,344 | 6,342 | 2.77 | 13,527 | 9,177| | IAA by_16 | 8,837 | 6,549 | 2.33 | 15,309 | 9,547| | IAA by_32 | 11,153 | 9,641 | 2.19 | 16,457 | 14,086| | IAA by_64 | 18,272 | 16,696 | 1.96 | 24,294 | 20,048| | | | | | | | | lz4 | 139,190 | 33,687 | 2.40 | 144,940 | 37,312| | | | | | | | | lzo-rle | 138,235 | 61,055 | 2.52 | 143,666 | 64,321| | | | | | | | | zstd | 251,820 | 90,878 | 3.40 | 256,384 | 94,328| +------------+-------------+---------+-------------+----------+-------+ [1]https://lore.kernel.org/all/cover.1714581792.git.andre.glover@linux. intel.com/