> -----Original Message----- > From: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Sent: Sunday, September 29, 2024 2:15 PM > To: Yosry Ahmed <yosryahmed@xxxxxxxxxx>; Johannes Weiner > <hannes@xxxxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx; > usamaarif642@xxxxxxxxx; shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx; > Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>; > Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Subject: RE: [PATCH v8 6/8] mm: zswap: Support large folios in zswap_store(). > > > -----Original Message----- > > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > Sent: Saturday, September 28, 2024 11:11 AM > > To: Johannes Weiner <hannes@xxxxxxxxxxx> > > Cc: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; linux- > > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; nphamcs@xxxxxxxxx; > > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx; > > shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying > > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > foundation.org; > > Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > > Subject: Re: [PATCH v8 6/8] mm: zswap: Support large folios in > zswap_store(). > > > > On Sat, Sep 28, 2024 at 7:15 AM Johannes Weiner <hannes@xxxxxxxxxxx> > > wrote: > > > > > > On Fri, Sep 27, 2024 at 08:42:16PM -0700, Yosry Ahmed wrote: > > > > On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar > > > > > { > > > > > + struct page *page = folio_page(folio, index); > > > > > swp_entry_t swp = folio->swap; > > > > > - pgoff_t offset = swp_offset(swp); > > > > > struct xarray *tree = swap_zswap_tree(swp); > > > > > + pgoff_t offset = swp_offset(swp) + index; > > > > > struct zswap_entry *entry, *old; > > > > > - struct obj_cgroup *objcg = NULL; > > > > > - struct mem_cgroup *memcg = NULL; > > > > > - > > > > > - VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > > > > - VM_WARN_ON_ONCE(!folio_test_swapcache(folio)); > > > > > + int type = swp_type(swp); > > > > > > > > Why do we need type? We use it when initializing entry->swpentry to > > > > reconstruct the swp_entry_t we already have. > > > > > > It's not the same entry. folio->swap points to the head entry, this > > > function has to store swap entries with the offsets of each subpage. > > > > Duh, yeah, thanks. > > > > > > > > Given the name of this function, it might be better to actually pass a > > > page pointer to it; do the folio_page() inside zswap_store(). > > > > > > Then do > > > > > > entry->swpentry = page_swap_entry(page); > > > > > > below. > > > > That is indeed clearer. > > > > Although this will be adding yet another caller of page_swap_entry() > > that already has the folio, yet it calls page_swap_entry() for each > > page in the folio, which calls page_folio() inside. > > > > I wonder if we should add (or replace page_swap_entry()) with a > > folio_swap_entry(folio, index) helper. This can also be done as a > > follow up. > > Thanks Johannes and Yosry for these comments. I was thinking about > this some more. In its current form, zswap_store_page() is called in the > context > of the folio by passing in a [folio, index]. This implies a key assumption about > the existing zswap_store() large folios functionality, i.e., we do the per-page > store for the page at a "index * PAGE_SIZE" within the folio, and not for any > arbitrary page. Further, we need the folio for folio_nid(); but this can also be > computed from the page. Another reason why I thought the existing signature > might be preferable is because it seems like it enables getting the entry's > swp_entry_t with fewer computes. Could calling page_swap_entry() add > more computes; which if it is the case, could potentially add up (say 512 > times) I went ahead and quantified this with the v8 signature of zswap_store_page() and the suggested changes for this function to take a page and use page_swap_entry(). I ran usemem with 2M pmd-mappable folios enabled. The results indicate that the page_swap_entry() implementation is slightly better in throughput and latency: v8: run1 run2 run3 average --------------------------------------------------------------------- Total throughput (KB/s): 6,483,835 6,396,760 6,349,532 6,410,042 Average throughput (KB/s): 216,127 213,225 211,651 213,889 elapsed time (sec): 107.75 107.06 109.99 108.87 sys time (sec): 2,476.43 2,453.99 2,551.52 2,513.98 --------------------------------------------------------------------- page_swap_entry(): run1 run2 run3 average --------------------------------------------------------------------- Total throughput (KB/s): 6,462,954 6,396,134 6,418,076 6,425,721 Average throughput (KB/s): 215,431 213,204 213,935 214,683 elapsed time (sec): 108.67 109.46 107.91 108.29 sys time (sec): 2,473.65 2,493.33 2,507.82 2,490.74 --------------------------------------------------------------------------- Based on this, I will go ahead and implement the change suggested by Johannes and submit a v9. Thanks, Kanchana > > I would appreciate your thoughts on whether these are valid considerations, > and can proceed accordingly. > > > > > > > > > > > obj_cgroup_put(objcg); > > > > > - if (zswap_pool_reached_full) > > > > > - queue_work(shrink_wq, &zswap_shrink_work); > > > > > -check_old: > > > > > + return false; > > > > > +} > > > > > + > > > > > +bool zswap_store(struct folio *folio) > > > > > +{ > > > > > + long nr_pages = folio_nr_pages(folio); > > > > > + swp_entry_t swp = folio->swap; > > > > > + struct xarray *tree = swap_zswap_tree(swp); > > > > > + pgoff_t offset = swp_offset(swp); > > > > > + struct obj_cgroup *objcg = NULL; > > > > > + struct mem_cgroup *memcg = NULL; > > > > > + struct zswap_pool *pool; > > > > > + size_t compressed_bytes = 0; > > > > > > > > Why size_t? entry->length is int. > > > > > > In light of Willy's comment, I think size_t is a good idea. > > > > Agreed. > > Thanks Yosry, Matthew and Johannes for the resolution on this! > > Thanks, > Kanchana