On Mon, Sep 23, 2024 at 6:17 PM Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> wrote: > > zswap_store() will now store mTHP and PMD-size THP folios by compressing > them page by page. > > This patch provides a sequential implementation of storing an mTHP in > zswap_store() by iterating through each page in the folio to compress > and store it in the zswap zpool. > > Towards this goal, zswap_compress() is modified to take a page instead > of a folio as input. > > Each page's swap offset is stored as a separate zswap entry. > > If an error is encountered during the store of any page in the mTHP, > all previous pages/entries stored will be invalidated. Thus, an mTHP > is either entirely stored in ZSWAP, or entirely not stored in ZSWAP. > > This forms the basis for building batching of pages during zswap store > of large folios by compressing batches of up to say, 8 pages in an > mTHP in parallel in hardware, with the Intel In-Memory Analytics > Accelerator (Intel IAA). > > A new config variable CONFIG_ZSWAP_STORE_THP_DEFAULT_ON (off by default) > will enable/disable zswap storing of (m)THP. The corresponding tunable > zswap module parameter is "mthp_enabled". > > This change reuses and adapts the functionality in Ryan Roberts' RFC > patch [1]: > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@xxxxxxx/T/#u > > Also, addressed some of the RFC comments from the discussion in [1]. > > Co-developed-by: Ryan Roberts > Signed-off-by: > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> > --- > mm/Kconfig | 8 ++++ > mm/zswap.c | 122 +++++++++++++++++++++++++---------------------------- > 2 files changed, 66 insertions(+), 64 deletions(-) > > diff --git a/mm/Kconfig b/mm/Kconfig > index 09aebca1cae3..c659fb732ec4 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -59,6 +59,14 @@ config ZSWAP_SHRINKER_DEFAULT_ON > reducing the chance that cold pages will reside in the zswap pool > and consume memory indefinitely. > > +config ZSWAP_STORE_THP_DEFAULT_ON > + bool "Store mTHP and THP folios in zswap" > + depends on ZSWAP > + default n > + help > + If selected, zswap will process mTHP and THP folios by > + compressing and storing each 4K page in the large folio. > + > choice > prompt "Default compressor" > depends on ZSWAP > diff --git a/mm/zswap.c b/mm/zswap.c > index 8f2e0ab34c84..16ab770546d6 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -127,6 +127,14 @@ static bool zswap_shrinker_enabled = IS_ENABLED( > CONFIG_ZSWAP_SHRINKER_DEFAULT_ON); > module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644); > > +/* > + * Enable/disable zswap processing of mTHP folios. > + * For now, only zswap_store will process mTHP folios. > + */ > +static bool zswap_mthp_enabled = IS_ENABLED( > + CONFIG_ZSWAP_STORE_THP_DEFAULT_ON); > +module_param_named(mthp_enabled, zswap_mthp_enabled, bool, 0644); > + Hmm, so this is a runtime knob. Also, should this be zswap_thp_enabled? :) > bool zswap_is_enabled(void) > { > return zswap_enabled; > @@ -1471,9 +1479,9 @@ static void zswap_delete_stored_offsets(struct xarray *tree, > * @objcg: The folio's objcg. > * @pool: The zswap_pool to store the compressed data for the page. > */ > -static bool __maybe_unused zswap_store_page(struct folio *folio, long index, > - struct obj_cgroup *objcg, > - struct zswap_pool *pool) > +static bool zswap_store_page(struct folio *folio, long index, > + struct obj_cgroup *objcg, > + struct zswap_pool *pool) > { > swp_entry_t swp = folio->swap; > int type = swp_type(swp); > @@ -1551,51 +1559,63 @@ static bool __maybe_unused zswap_store_page(struct folio *folio, long index, > return false; > } > > +/* > + * Modified to store mTHP folios. Each page in the mTHP will be compressed > + * and stored sequentially. > + */ > bool zswap_store(struct folio *folio) > { > long nr_pages = folio_nr_pages(folio); > swp_entry_t swp = folio->swap; > pgoff_t offset = swp_offset(swp); > struct xarray *tree = swap_zswap_tree(swp); > - struct zswap_entry *entry; > struct obj_cgroup *objcg = NULL; > struct mem_cgroup *memcg = NULL; > + struct zswap_pool *pool; > + bool ret = false; > + long index; > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); > > - /* Large folios aren't supported */ > - if (folio_test_large(folio)) > + /* Storing large folios isn't enabled */ > + if (!zswap_mthp_enabled && folio_test_large(folio)) > return false; Hmm can this go wrong somehow? Can we have a case where we enable zswap_mthp_enabled, have a large folio written to zswap, disable zswap_mthp_enabled, and attempt to store that folio to zswap again. Now, we have a stale copy in zswap that is not invalidated...? Or am I missing something here :)