Re: [PATCH 0/4] crypto: Add new compression modes for zlib and IAA

Andre Glover <andre.glover@xxxxxxxxxxxxxxx> · Wed, 01 May 2024 15:07:15 -0700

Hi Herbert,

On Fri, 2024-04-05 at 15:07 +0800, Herbert Xu wrote:
> On Thu, Mar 28, 2024 at 10:44:41AM -0700, Andre Glover wrote:
> > 
> > Below is a table showing the latency improvements with zlib,
> > between
> > zlib dynamic and zlib canned modes, and the compression ratio for 
> > each mode while using a set of 4300 4KB pages sampled from SPEC 
> > CPU17 workloads:
> > _________________________________________________________
> > > Zlib Level |  Canned Latency Gain  |    Comp Ratio    |
> > > ------------|-----------------------|------------------|
> > >            | compress | decompress | dynamic | canned |
> > > ____________|__________|____________|_________|________|
> > >     1      |    49%   |    29%     |  3.16   |  2.92  |
> > > ------------|----------|------------|---------|--------|
> > >     6        |    27%   |    28%     |  3.35   |  3.09  |
> > > ------------|----------|------------|---------|--------|
> > >     9      |    12%   |    29%     |  3.36   |  3.11  |
> > > ____________|__________|____________|_________|________|
> 
> So which kernel user (zswap I presume) is clamouring for this
> feature? We don't add new algorithms that have no in-kernel
> users.  So we need to be sure that the kernel user actually
> want this.
> 
> Thanks,

Hi Herbert,
We have recently submitted an RFC to zswap and zram maintainers and
users for by_n compression with Intel IAA [1] feedback. This work is in
support of efforts to swap in/out large and multi-sized folios. With
by_n compression, we have created a scheme that allows parallel IAA
compression and decompression operations on a single folio resulting in
performance gains. Currently the by_n scheme uses the canned mode
compression algorithm to perform the compression and decompression
operations. Using canned mode compression results in reduced
compression latency because the deflate header doesnt need to be
created dynamically, while also producing better ratio than Deflate
Fixed mode. We would appreciate your feedback on this scheme.

Here is data from the RFC showing a performance comparison for 64KB
folio swap in/out 
with zram on Sapphire Rapids, whose core frequency is fixed at 2500MHz:
+------------+-------------+---------+-------------+----------+-------+
|            | Compression | Decomp  | Compression | zram     | zram  |
| Algorithm  | latency     | latency | ratio       | write    | read  |
+------------+-------------+---------+-------------+----------+-------+
|            |       Median (ns)     |             |      Median (ns) |
+------------+-------------+---------+-------------+----------+-------+
|            |             |         |             |          |       |
| IAA by_1   | 34,493      | 20,038  | 2.93        | 40,130   | 24,478|
| IAA by_2   | 18,830      | 11,888  | 2.93        | 24,149   | 15,536|
| IAA by_4   | 11,364      |  8,146  | 2.90        | 16,735   | 11,469|
| IAA by_8   |  8,344      |  6,342  | 2.77        | 13,527   |  9,177|
| IAA by_16  |  8,837      |  6,549  | 2.33        | 15,309   |  9,547|
| IAA by_32  | 11,153      |  9,641  | 2.19        | 16,457   | 14,086|
| IAA by_64  | 18,272      | 16,696  | 1.96        | 24,294   | 20,048|
|            |             |         |             |          |       |
| lz4        | 139,190     | 33,687  | 2.40        | 144,940  | 37,312|
|            |             |         |             |          |       |
| lzo-rle    | 138,235     | 61,055  | 2.52        | 143,666  | 64,321|
|            |             |         |             |          |       |
| zstd       | 251,820     | 90,878  | 3.40        | 256,384  | 94,328|
+------------+-------------+---------+-------------+----------+-------+

[1]https://lore.kernel.org/all/cover.1714581792.git.andre.glover@linux.
intel.com/