> -----Original Message----- > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Sent: Wednesday, October 23, 2024 11:16 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > hannes@xxxxxxxxxxx; nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx; > usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > linux-crypto@xxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx; > davem@xxxxxxxxxxxxx; clabbe@xxxxxxxxxxxx; ardb@xxxxxxxxxx; > ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi, Kristen C > <kristen.c.accardi@xxxxxxxxx>; zanussi@xxxxxxxxxx; viro@xxxxxxxxxxxxxxxxxx; > brauner@xxxxxxxxxx; jack@xxxxxxx; mcgrof@xxxxxxxxxx; kees@xxxxxxxxxx; > joel.granados@xxxxxxxxxx; bfoster@xxxxxxxxxx; willy@xxxxxxxxxxxxx; linux- > fsdevel@xxxxxxxxxxxxxxx; Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>; Gopal, > Vinodh <vinodh.gopal@xxxxxxxxx> > Subject: Re: [RFC PATCH v1 00/13] zswap IAA compress batching > > On Tue, Oct 22, 2024 at 7:53 PM Sridhar, Kanchana P > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > Hi Yosry, > > > > > -----Original Message----- > > > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > > Sent: Tuesday, October 22, 2024 5:57 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > > > hannes@xxxxxxxxxxx; nphamcs@xxxxxxxxx; > chengming.zhou@xxxxxxxxx; > > > usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > > > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > foundation.org; > > > linux-crypto@xxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx; > > > davem@xxxxxxxxxxxxx; clabbe@xxxxxxxxxxxx; ardb@xxxxxxxxxx; > > > ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi, Kristen C > > > <kristen.c.accardi@xxxxxxxxx>; zanussi@xxxxxxxxxx; > viro@xxxxxxxxxxxxxxxxxx; > > > brauner@xxxxxxxxxx; jack@xxxxxxx; mcgrof@xxxxxxxxxx; > kees@xxxxxxxxxx; > > > joel.granados@xxxxxxxxxx; bfoster@xxxxxxxxxx; willy@xxxxxxxxxxxxx; > linux- > > > fsdevel@xxxxxxxxxxxxxxx; Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>; > Gopal, > > > Vinodh <vinodh.gopal@xxxxxxxxx> > > > Subject: Re: [RFC PATCH v1 00/13] zswap IAA compress batching > > > > > > On Thu, Oct 17, 2024 at 11:41 PM Kanchana P Sridhar > > > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > > > > > > IAA Compression Batching: > > > > ========================= > > > > > > > > This RFC patch-series introduces the use of the Intel Analytics > Accelerator > > > > (IAA) for parallel compression of pages in a folio, and for batched reclaim > > > > of hybrid any-order batches of folios in shrink_folio_list(). > > > > > > > > The patch-series is organized as follows: > > > > > > > > 1) iaa_crypto driver enablers for batching: Relevant patches are tagged > > > > with "crypto:" in the subject: > > > > > > > > a) async poll crypto_acomp interface without interrupts. > > > > b) crypto testmgr acomp poll support. > > > > c) Modifying the default sync_mode to "async" and disabling > > > > verify_compress by default, to facilitate users to run IAA easily for > > > > comparison with software compressors. > > > > d) Changing the cpu-to-iaa mappings to more evenly balance cores to > IAA > > > > devices. > > > > e) Addition of a "global_wq" per IAA, which can be used as a global > > > > resource for the socket. If the user configures 2WQs per IAA device, > > > > the driver will distribute compress jobs from all cores on the > > > > socket to the "global_wqs" of all the IAA devices on that socket, in > > > > a round-robin manner. This can be used to improve compression > > > > throughput for workloads that see a lot of swapout activity. > > > > > > > > 2) Migrating zswap to use async poll in > zswap_compress()/decompress(). > > > > 3) A centralized batch compression API that can be used by swap > modules. > > > > 4) IAA compress batching within large folio zswap stores. > > > > 5) IAA compress batching of any-order hybrid folios in > > > > shrink_folio_list(). The newly added "sysctl vm.compress-batchsize" > > > > parameter can be used to configure the number of folios in [1, 32] to > > > > be reclaimed using compress batching. > > > > > > I am still digesting this series but I have some high level questions > > > that I left on some patches. My intuition though is that we should > > > drop (5) from the initial proposal as it's most controversial. > > > Batching reclaim of unrelated folios through zswap *might* make sense, > > > but it needs a broader conversation and it needs justification on its > > > own merit, without the rest of the series. > > > > Thanks for these suggestions! Sure, I can drop (5) from the initial patch-set. > > Agree also, this needs a broader discussion. > > > > I believe the 4K folios usemem30 data in this patchset does bring across > > the batching reclaim benefits to provide justification on its own merit. I > added > > the data on batching reclaim with kernel compilation as part of the 4K folios > > experiments in the IAA decompression batching patch-series [1]. > > Listing it here as well. I will make sure to add this data in subsequent revs. > > > > -------------------------------------------------------------------------- > > Kernel compilation in tmpfs/allmodconfig, 2G max memory: > > > > No large folios mm-unstable-10-16-2024 shrink_folio_list() > > batching of folios > > -------------------------------------------------------------------------- > > zswap compressor zstd deflate-iaa deflate-iaa > > vm.compress-batchsize n/a n/a 32 > > vm.page-cluster 3 3 3 > > -------------------------------------------------------------------------- > > real_sec 783.87 761.69 747.32 > > user_sec 15,750.07 15,716.69 15,728.39 > > sys_sec 6,522.32 5,725.28 5,399.44 > > Max_RSS_KB 1,872,640 1,870,848 1,874,432 > > > > zswpout 82,364,991 97,739,600 102,780,612 > > zswpin 21,303,393 27,684,166 29,016,252 > > pswpout 13 222 213 > > pswpin 12 209 202 > > pgmajfault 17,114,339 22,421,211 23,378,161 > > swap_ra 4,596,035 5,840,082 6,231,646 > > swap_ra_hit 2,903,249 3,682,444 3,940,420 > > -------------------------------------------------------------------------- > > > > The performance improvements seen does depend on compression batching > in > > the swap modules (zswap). The implementation in patch 12 in the compress > > batching series sets up this zswap compression pipeline, that takes an array > of > > folios and processes them in batches of 8 pages compressed in parallel in > hardware. > > That being said, we do see latency improvements even with reclaim > batching > > combined with zswap compress batching with zstd/lzo-rle/etc. I haven't > done a > > lot of analysis of this, but I am guessing fewer calls from the swap layer > > (swap_writepage()) into zswap could have something to do with this. If we > believe > > that batching can be the right thing to do even for the software > compressors, > > I can gather batching data with zstd for v2. > > Thanks for sharing the data. What I meant is, I think we should focus > on supporting large folio compression batching for this series, and > only present figures for this support to avoid confusion. > > Once this lands, we can discuss support for batching the compression > of different unrelated folios separately, as it spans areas beyond > just zswap and will need broader discussion. Absolutely, this makes sense, thanks Yosry! I will address this in v2. Thanks, Kanchana