RE: [PATCH v8 6/8] mm: zswap: Support large folios in zswap_store().

"Sridhar, Kanchana P" <kanchana.p.sridhar@xxxxxxxxx> · Mon, 30 Sep 2024 17:55:44 +0000

> -----Original Message-----
> From: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> Sent: Sunday, September 29, 2024 2:15 PM
> To: Yosry Ahmed <yosryahmed@xxxxxxxxxx>; Johannes Weiner
> <hannes@xxxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> nphamcs@xxxxxxxxx; chengming.zhou@xxxxxxxxx;
> usamaarif642@xxxxxxxxx; shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx;
> Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>;
> Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> Subject: RE: [PATCH v8 6/8] mm: zswap: Support large folios in zswap_store().
> 
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> > Sent: Saturday, September 28, 2024 11:11 AM
> > To: Johannes Weiner <hannes@xxxxxxxxxxx>
> > Cc: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; linux-
> > kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; nphamcs@xxxxxxxxx;
> > chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx;
> > shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying
> > <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux-
> foundation.org;
> > Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> > Subject: Re: [PATCH v8 6/8] mm: zswap: Support large folios in
> zswap_store().
> >
> > On Sat, Sep 28, 2024 at 7:15 AM Johannes Weiner <hannes@xxxxxxxxxxx>
> > wrote:
> > >
> > > On Fri, Sep 27, 2024 at 08:42:16PM -0700, Yosry Ahmed wrote:
> > > > On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar
> > > > >  {
> > > > > +       struct page *page = folio_page(folio, index);
> > > > >         swp_entry_t swp = folio->swap;
> > > > > -       pgoff_t offset = swp_offset(swp);
> > > > >         struct xarray *tree = swap_zswap_tree(swp);
> > > > > +       pgoff_t offset = swp_offset(swp) + index;
> > > > >         struct zswap_entry *entry, *old;
> > > > > -       struct obj_cgroup *objcg = NULL;
> > > > > -       struct mem_cgroup *memcg = NULL;
> > > > > -
> > > > > -       VM_WARN_ON_ONCE(!folio_test_locked(folio));
> > > > > -       VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> > > > > +       int type = swp_type(swp);
> > > >
> > > > Why do we need type? We use it when initializing entry->swpentry to
> > > > reconstruct the swp_entry_t we already have.
> > >
> > > It's not the same entry. folio->swap points to the head entry, this
> > > function has to store swap entries with the offsets of each subpage.
> >
> > Duh, yeah, thanks.
> >
> > >
> > > Given the name of this function, it might be better to actually pass a
> > > page pointer to it; do the folio_page() inside zswap_store().
> > >
> > > Then do
> > >
> > >                 entry->swpentry = page_swap_entry(page);
> > >
> > > below.
> >
> > That is indeed clearer.
> >
> > Although this will be adding yet another caller of page_swap_entry()
> > that already has the folio, yet it calls page_swap_entry() for each
> > page in the folio, which calls page_folio() inside.
> >
> > I wonder if we should add (or replace page_swap_entry()) with a
> > folio_swap_entry(folio, index) helper. This can also be done as a
> > follow up.
> 
> Thanks Johannes and Yosry for these comments. I was thinking about
> this some more. In its current form, zswap_store_page() is called in the
> context
> of the folio by passing in a [folio, index]. This implies a key assumption about
> the existing zswap_store() large folios functionality, i.e., we do the per-page
> store for the page at a "index * PAGE_SIZE" within the folio, and not for any
> arbitrary page. Further, we need the folio for folio_nid(); but this can also be
> computed from the page. Another reason why I thought the existing signature
> might be preferable is because it seems like it enables getting the entry's
> swp_entry_t with fewer computes. Could calling page_swap_entry() add
> more computes; which if it is the case, could potentially add up (say 512
> times)

I went ahead and quantified this with the v8 signature of zswap_store_page()
and the suggested changes for this function to take a page and use
page_swap_entry(). I ran usemem with 2M pmd-mappable folios enabled.
The results indicate that the page_swap_entry() implementation is slightly
better in throughput and latency:

v8:                             run1       run2       run3    average
---------------------------------------------------------------------
Total throughput (KB/s):   6,483,835  6,396,760  6,349,532  6,410,042
Average throughput (KB/s):   216,127    213,225	   211,651    213,889
elapsed time (sec):           107.75     107.06	    109.99     108.87
sys time (sec):             2,476.43   2,453.99	  2,551.52   2,513.98
---------------------------------------------------------------------

page_swap_entry():              run1       run2       run3    average
---------------------------------------------------------------------
Total throughput (KB/s):   6,462,954  6,396,134  6,418,076  6,425,721
Average throughput (KB/s):   215,431    213,204	   213,935    214,683
elapsed time (sec):           108.67     109.46	    107.91     108.29
sys time (sec):             2,473.65   2,493.33	  2,507.82   2,490.74
---------------------------------------------------------------------------

Based on this, I will go ahead and implement the change suggested
by Johannes and submit a v9.

Thanks,
Kanchana

> 
> I would appreciate your thoughts on whether these are valid considerations,
> and can proceed accordingly.
> 
> >
> > >
> > > > >         obj_cgroup_put(objcg);
> > > > > -       if (zswap_pool_reached_full)
> > > > > -               queue_work(shrink_wq, &zswap_shrink_work);
> > > > > -check_old:
> > > > > +       return false;
> > > > > +}
> > > > > +
> > > > > +bool zswap_store(struct folio *folio)
> > > > > +{
> > > > > +       long nr_pages = folio_nr_pages(folio);
> > > > > +       swp_entry_t swp = folio->swap;
> > > > > +       struct xarray *tree = swap_zswap_tree(swp);
> > > > > +       pgoff_t offset = swp_offset(swp);
> > > > > +       struct obj_cgroup *objcg = NULL;
> > > > > +       struct mem_cgroup *memcg = NULL;
> > > > > +       struct zswap_pool *pool;
> > > > > +       size_t compressed_bytes = 0;
> > > >
> > > > Why size_t? entry->length is int.
> > >
> > > In light of Willy's comment, I think size_t is a good idea.
> >
> > Agreed.
> 
> Thanks Yosry, Matthew and Johannes for the resolution on this!
> 
> Thanks,
> Kanchana