> -----Original Message----- > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > Sent: Tuesday, October 22, 2024 5:50 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > hannes@xxxxxxxxxxx; nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx; > usamaarif642@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; > linux-crypto@xxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx; > davem@xxxxxxxxxxxxx; clabbe@xxxxxxxxxxxx; ardb@xxxxxxxxxx; > ebiggers@xxxxxxxxxx; surenb@xxxxxxxxxx; Accardi, Kristen C > <kristen.c.accardi@xxxxxxxxx>; zanussi@xxxxxxxxxx; viro@xxxxxxxxxxxxxxxxxx; > brauner@xxxxxxxxxx; jack@xxxxxxx; mcgrof@xxxxxxxxxx; kees@xxxxxxxxxx; > joel.granados@xxxxxxxxxx; bfoster@xxxxxxxxxx; willy@xxxxxxxxxxxxx; linux- > fsdevel@xxxxxxxxxxxxxxx; Feghali, Wajdi K <wajdi.k.feghali@xxxxxxxxx>; Gopal, > Vinodh <vinodh.gopal@xxxxxxxxx> > Subject: Re: [RFC PATCH v1 09/13] mm: zswap: Config variable to enable > compress batching in zswap_store(). > > On Thu, Oct 17, 2024 at 11:41 PM Kanchana P Sridhar > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > Add a new zswap config variable that controls whether zswap_store() will > > compress a batch of pages, for instance, the pages in a large folio: > > > > CONFIG_ZSWAP_STORE_BATCHING_ENABLED > > > > The existing CONFIG_CRYPTO_DEV_IAA_CRYPTO variable added in commit > > ea7a5cbb4369 ("crypto: iaa - Add Intel IAA Compression Accelerator crypto > > driver core") is used to detect if the system has the Intel Analytics > > Accelerator (IAA), and the iaa_crypto module is available. If so, the > > kernel build will prompt for CONFIG_ZSWAP_STORE_BATCHING_ENABLED. > Hence, > > users have the ability to set > CONFIG_ZSWAP_STORE_BATCHING_ENABLED="y" only > > on systems that have Intel IAA. > > > > If CONFIG_ZSWAP_STORE_BATCHING_ENABLED is enabled, and IAA is > configured > > as the zswap compressor, zswap_store() will process the pages in a large > > folio in batches, i.e., multiple pages at a time. Pages in a batch will be > > compressed in parallel in hardware, then stored. On systems without Intel > > IAA and/or if zswap uses software compressors, pages in the batch will be > > compressed sequentially and stored. > > > > The patch also implements a zswap API that returns the status of this > > config variable. > > If we are compressing a large folio and batching is an option, is not > batching ever the correct thing to do? Why is the config option > needed? Thanks Yosry, for the code review comments! This is a good point. The main consideration here was not to impact software compressors run on non-Intel platforms, and only incur the memory footprint cost of multiple acomp_req/buffers in "struct crypto_acomp_ctx" if there is IAA to reduce latency with parallel compressions. If the memory footprint cost if acceptable, there is no reason not to do batching, even if compressions are sequential. We could amortize cost of the cgroup charging/objcg/stats updates. Thanks, Kanchana > > > > > Suggested-by: Ying Huang <ying.huang@xxxxxxxxx> > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> > > --- > > include/linux/zswap.h | 6 ++++++ > > mm/Kconfig | 12 ++++++++++++ > > mm/zswap.c | 14 ++++++++++++++ > > 3 files changed, 32 insertions(+) > > > > diff --git a/include/linux/zswap.h b/include/linux/zswap.h > > index d961ead91bf1..74ad2a24b309 100644 > > --- a/include/linux/zswap.h > > +++ b/include/linux/zswap.h > > @@ -24,6 +24,7 @@ struct zswap_lruvec_state { > > atomic_long_t nr_disk_swapins; > > }; > > > > +bool zswap_store_batching_enabled(void); > > unsigned long zswap_total_pages(void); > > bool zswap_store(struct folio *folio); > > bool zswap_load(struct folio *folio); > > @@ -39,6 +40,11 @@ bool zswap_never_enabled(void); > > > > struct zswap_lruvec_state {}; > > > > +static inline bool zswap_store_batching_enabled(void) > > +{ > > + return false; > > +} > > + > > static inline bool zswap_store(struct folio *folio) > > { > > return false; > > diff --git a/mm/Kconfig b/mm/Kconfig > > index 33fa51d608dc..26d1a5cee471 100644 > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -125,6 +125,18 @@ config ZSWAP_COMPRESSOR_DEFAULT > > default "zstd" if ZSWAP_COMPRESSOR_DEFAULT_ZSTD > > default "" > > > > +config ZSWAP_STORE_BATCHING_ENABLED > > + bool "Batching of zswap stores with Intel IAA" > > + depends on ZSWAP && CRYPTO_DEV_IAA_CRYPTO > > + default n > > + help > > + Enables zswap_store to swapout large folios in batches of 8 pages, > > + rather than a page at a time, if the system has Intel IAA for hardware > > + acceleration of compressions. If IAA is configured as the zswap > > + compressor, this will parallelize batch compression of upto 8 pages > > + in the folio in hardware, thereby improving large folio compression > > + throughput and reducing swapout latency. > > + > > choice > > prompt "Default allocator" > > depends on ZSWAP > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 948c9745ee57..4893302d8c34 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -127,6 +127,15 @@ static bool zswap_shrinker_enabled = > IS_ENABLED( > > CONFIG_ZSWAP_SHRINKER_DEFAULT_ON); > > module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, > 0644); > > > > +/* > > + * Enable/disable batching of compressions if zswap_store is called with a > > + * large folio. If enabled, and if IAA is the zswap compressor, pages are > > + * compressed in parallel in batches of say, 8 pages. > > + * If not, every page is compressed sequentially. > > + */ > > +static bool __zswap_store_batching_enabled = IS_ENABLED( > > + CONFIG_ZSWAP_STORE_BATCHING_ENABLED); > > + > > bool zswap_is_enabled(void) > > { > > return zswap_enabled; > > @@ -241,6 +250,11 @@ static inline struct xarray > *swap_zswap_tree(swp_entry_t swp) > > pr_debug("%s pool %s/%s\n", msg, (p)->tfm_name, \ > > zpool_get_type((p)->zpool)) > > > > +__always_inline bool zswap_store_batching_enabled(void) > > +{ > > + return __zswap_store_batching_enabled; > > +} > > + > > /********************************* > > * pool functions > > **********************************/ > > -- > > 2.27.0 > >