RE: [PATCH v8 6/8] mm: zswap: Support large folios in zswap_store().

"Sridhar, Kanchana P" <kanchana.p.sridhar@xxxxxxxxx> · Sun, 29 Sep 2024 21:15:06 +0000

> -----Original Message-----
> From: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> Sent: Saturday, September 28, 2024 11:11 AM
> To: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; linux-
> kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; nphamcs@xxxxxxxxx;
> chengming.zhou@xxxxxxxxx; usamaarif642@xxxxxxxxx;
> shakeel.butt@xxxxxxxxx; ryan.roberts@xxxxxxx; Huang, Ying
> <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx;
> Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> Subject: Re: [PATCH v8 6/8] mm: zswap: Support large folios in zswap_store().
> 
> On Sat, Sep 28, 2024 at 7:15 AM Johannes Weiner <hannes@xxxxxxxxxxx>
> wrote:
> >
> > On Fri, Sep 27, 2024 at 08:42:16PM -0700, Yosry Ahmed wrote:
> > > On Fri, Sep 27, 2024 at 7:16 PM Kanchana P Sridhar
> > > >  {
> > > > +       struct page *page = folio_page(folio, index);
> > > >         swp_entry_t swp = folio->swap;
> > > > -       pgoff_t offset = swp_offset(swp);
> > > >         struct xarray *tree = swap_zswap_tree(swp);
> > > > +       pgoff_t offset = swp_offset(swp) + index;
> > > >         struct zswap_entry *entry, *old;
> > > > -       struct obj_cgroup *objcg = NULL;
> > > > -       struct mem_cgroup *memcg = NULL;
> > > > -
> > > > -       VM_WARN_ON_ONCE(!folio_test_locked(folio));
> > > > -       VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> > > > +       int type = swp_type(swp);
> > >
> > > Why do we need type? We use it when initializing entry->swpentry to
> > > reconstruct the swp_entry_t we already have.
> >
> > It's not the same entry. folio->swap points to the head entry, this
> > function has to store swap entries with the offsets of each subpage.
> 
> Duh, yeah, thanks.
> 
> >
> > Given the name of this function, it might be better to actually pass a
> > page pointer to it; do the folio_page() inside zswap_store().
> >
> > Then do
> >
> >                 entry->swpentry = page_swap_entry(page);
> >
> > below.
> 
> That is indeed clearer.
> 
> Although this will be adding yet another caller of page_swap_entry()
> that already has the folio, yet it calls page_swap_entry() for each
> page in the folio, which calls page_folio() inside.
> 
> I wonder if we should add (or replace page_swap_entry()) with a
> folio_swap_entry(folio, index) helper. This can also be done as a
> follow up.

Thanks Johannes and Yosry for these comments. I was thinking about
this some more. In its current form, zswap_store_page() is called in the context
of the folio by passing in a [folio, index]. This implies a key assumption about
the existing zswap_store() large folios functionality, i.e., we do the per-page
store for the page at a "index * PAGE_SIZE" within the folio, and not for any
arbitrary page. Further, we need the folio for folio_nid(); but this can also be
computed from the page. Another reason why I thought the existing signature
might be preferable is because it seems like it enables getting the entry's
swp_entry_t with fewer computes. Could calling page_swap_entry() add
more computes; which if it is the case, could potentially add up (say 512 times)

I would appreciate your thoughts on whether these are valid considerations,
and can proceed accordingly.

> 
> >
> > > >         obj_cgroup_put(objcg);
> > > > -       if (zswap_pool_reached_full)
> > > > -               queue_work(shrink_wq, &zswap_shrink_work);
> > > > -check_old:
> > > > +       return false;
> > > > +}
> > > > +
> > > > +bool zswap_store(struct folio *folio)
> > > > +{
> > > > +       long nr_pages = folio_nr_pages(folio);
> > > > +       swp_entry_t swp = folio->swap;
> > > > +       struct xarray *tree = swap_zswap_tree(swp);
> > > > +       pgoff_t offset = swp_offset(swp);
> > > > +       struct obj_cgroup *objcg = NULL;
> > > > +       struct mem_cgroup *memcg = NULL;
> > > > +       struct zswap_pool *pool;
> > > > +       size_t compressed_bytes = 0;
> > >
> > > Why size_t? entry->length is int.
> >
> > In light of Willy's comment, I think size_t is a good idea.
> 
> Agreed.

Thanks Yosry, Matthew and Johannes for the resolution on this!

Thanks,
Kanchana