Hi Chengming, > -----Original Message----- > From: Chengming Zhou <chengming.zhou@xxxxxxxxx> > Sent: Monday, September 2, 2024 4:38 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; linux- > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; hannes@xxxxxxxxxxx; > yosryahmed@xxxxxxxxxx; nphamcs@xxxxxxxxx; usamaarif642@xxxxxxxxx; > ryan.roberts@xxxxxxx; Huang, Ying <ying.huang@xxxxxxxxx>; > 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx > Cc: Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > Subject: Re: [PATCH v6 2/3] mm: zswap: zswap_store() extended to handle > mTHP folios. > > On 2024/8/30 05:27, Kanchana P Sridhar wrote: > > zswap_store() will now process and store mTHP and PMD-size THP folios. > > > > A new config variable CONFIG_ZSWAP_STORE_THP_DEFAULT_ON (off by > default) > > will enable/disable zswap storing of (m)THP. > > > > This change reuses and adapts the functionality in Ryan Roberts' RFC > > patch [1]: > > > > "[RFC,v1] mm: zswap: Store large folios without splitting" > > > > [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > ryan.roberts@xxxxxxx/T/#u > > > > This patch provides a sequential implementation of storing an mTHP in > > zswap_store() by iterating through each page in the folio to compress > > and store it in the zswap zpool. > > > > Towards this goal, zswap_compress() is modified to take a page instead > > of a folio as input. > > > > Each page's swap offset is stored as a separate zswap entry. > > > > If an error is encountered during the store of any page in the mTHP, > > all previous pages/entries stored will be invalidated. Thus, an mTHP > > is either entirely stored in ZSWAP, or entirely not stored in ZSWAP. > > > > This forms the basis for building batching of pages during zswap store > > of large folios, by compressing batches of up to say, 8 pages in an > > mTHP in parallel in hardware, with the Intel In-Memory Analytics > > Accelerator (Intel IAA). > > > > Also, addressed some of the RFC comments from the discussion in [1]. > > > > Made a minor edit in the comments for "struct zswap_entry" to delete > > the comments related to "value" since same-filled page handling has > > been removed from zswap. > > > > Co-developed-by: Ryan Roberts > > Signed-off-by: > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> > > The code looks ok, but I also find this patch is a little hard to > review, maybe it's better to split it into small patches as Yosry suggested. Definitely, will do so and submit a v7. Thanks, Kanchana > > Thanks! > > [...] > > + > > +/* > > + * Modified to store mTHP folios. Each page in the mTHP will be > compressed > > + * and stored sequentially. > > + */ > > +bool zswap_store(struct folio *folio) > > +{ > > + long nr_pages = folio_nr_pages(folio); > > + swp_entry_t swp = folio->swap; > > + pgoff_t offset = swp_offset(swp); > > + struct xarray *tree = swap_zswap_tree(swp); > > + struct obj_cgroup *objcg = NULL; > > + struct mem_cgroup *memcg = NULL; > > + struct zswap_pool *pool; > > + bool ret = false; > > + long index; > > + > > + VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > + VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); > > + > > + /* Storing large folios isn't enabled */ > > + if (!zswap_mthp_enabled && folio_test_large(folio)) > > + return false; > > + > > + if (!zswap_enabled) > > + goto reject; > > + > > /* > > - * If the zswap store fails or zswap is disabled, we must invalidate the > > - * possibly stale entry which was previously stored at this offset. > > - * Otherwise, writeback could overwrite the new data in the swapfile. > > + * Check cgroup limits: > > + * > > + * The cgroup zswap limit check is done once at the beginning of an > > + * mTHP store, and not within zswap_store_page() for each page > > + * in the mTHP. We do however check the zswap pool limits at the > > + * start of zswap_store_page(). What this means is, the cgroup > > + * could go over the limits by at most (HPAGE_PMD_NR - 1) pages. > > + * However, the per-store-page zswap pool limits check should > > + * hopefully trigger the cgroup aware and zswap LRU aware global > > + * reclaim implemented in the shrinker. If this assumption holds, > > + * the cgroup exceeding the zswap limits could potentially be > > + * resolved before the next zswap_store, and if it is not, the next > > + * zswap_store would fail the cgroup zswap limit check at the start. > > */ > > - entry = xa_erase(tree, offset); > > - if (entry) > > - zswap_entry_free(entry); > > - return false; > > + objcg = get_obj_cgroup_from_folio(folio); > > + if (objcg && !obj_cgroup_may_zswap(objcg)) { > > + memcg = get_mem_cgroup_from_objcg(objcg); > > + if (shrink_memcg(memcg)) { > > + mem_cgroup_put(memcg); > > + goto put_objcg; > > + } > > + mem_cgroup_put(memcg); > > + } > > + > > + if (zswap_check_limits()) > > + goto put_objcg; > > + > > + pool = zswap_pool_current_get(); > > + if (!pool) > > + goto put_objcg; > > + > > + if (objcg) { > > + memcg = get_mem_cgroup_from_objcg(objcg); > > + if (memcg_list_lru_alloc(memcg, &zswap_list_lru, > GFP_KERNEL)) { > > + mem_cgroup_put(memcg); > > + goto put_pool; > > + } > > + mem_cgroup_put(memcg); > > + } > > + > > + /* > > + * Store each page of the folio as a separate entry. If we fail to store > > + * a page, unwind by removing all the previous pages we stored. > > + */ > > + for (index = 0; index < nr_pages; ++index) { > > + if (!zswap_store_page(folio, index, objcg, pool)) > > + goto put_pool; > > + } > > + > > + ret = true; > > + > > +put_pool: > > + zswap_pool_put(pool); > > +put_objcg: > > + obj_cgroup_put(objcg); > > + if (zswap_pool_reached_full) > > + queue_work(shrink_wq, &zswap_shrink_work); > > +reject: > > + /* > > + * If the zswap store fails or zswap is disabled, we must invalidate > > + * the possibly stale entries which were previously stored at the > > + * offsets corresponding to each page of the folio. Otherwise, > > + * writeback could overwrite the new data in the swapfile. > > + */ > > + if (!ret) > > + zswap_delete_stored_offsets(tree, offset, nr_pages); > > + > > + return ret; > > } > > > > bool zswap_load(struct folio *folio)