RE: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nhat,

> -----Original Message-----
> From: Nhat Pham <nphamcs@xxxxxxxxx>
> Sent: Thursday, August 29, 2024 10:11 AM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; ryan.roberts@xxxxxxx;
> Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>;
> Usama Arif <usamaarif642@xxxxxxxxx>; Chengming Zhou
> <chengming.zhou@xxxxxxxxx>
> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> 
> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P
> <kanchana.p.sridhar@xxxxxxxxx> wrote:
> >
> >
> > > -----Original Message-----
> > > From: Nhat Pham <nphamcs@xxxxxxxxx>
> > > Sent: Wednesday, August 28, 2024 2:35 PM
> > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> > > hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx;
> ryan.roberts@xxxxxxx;
> > > Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux-
> > > foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> > > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> > >
> > > On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar
> > > <kanchana.p.sridhar@xxxxxxxxx> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > This patch-series enables zswap_store() to accept and store mTHP
> > > > folios. The most significant contribution in this series is from the
> > > > earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been
> > > > migrated to v6.11-rc3 in patch 2/4 of this series.
> > > >
> > > > [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
> > > >      https://lore.kernel.org/linux-mm/20231019110543.3284654-1-
> > > ryan.roberts@xxxxxxx/T/#u
> > > >
> > > > Additionally, there is an attempt to modularize some of the functionality
> > > > in zswap_store(), to make it more amenable to supporting any-order
> > > > mTHPs. For instance, the function zswap_store_entry() stores a
> > > zswap_entry
> > > > in the xarray. Likewise, zswap_delete_stored_offsets() can be used to
> > > > delete all offsets corresponding to a higher order folio stored in zswap.
> > > >
> > >
> > > Will this have any conflict with mTHP swap work? Especially with mTHP
> > > swap-in and zswap writeback.
> > >
> > > My understanding is from zswap's perspective, the large folio is
> > > broken apart into independent subpages, correct? What happens when
> we
> > > have partially written back mTHP (i.e some subpages are in zswap
> > > still, whereas others are written back to swap). Would this
> > > automatically prevent mTHP swapin?
> >
> > That is a good point. To begin with, this patch-series would make the default
> > behavior for mTHP swapout/storage and swapin for ZSWAP to be on par
> with
> > ZRAM. From zswap's perspective, imo this is a significant step forward
> towards
> > realizing cold memory storage with mTHP folios. However, it is only a
> starting
> > point that makes the behavior uniform across zswap/zram. Initially,
> workloads
> > would see a one-time benefit with reclaim being able to swapout mTHP
> > folios without splitting, to zswap. If the mTHPs were cold memory, then we
> > would have derived latency gains towards memory savings (with zswap).
> >
> > However, if the mTHP were part of "not so cold" memory, this would result
> > in a one-way mTHP conversion to 4K folios. Depending on workloads and
> their
> > access patterns, we could either see individual 4K folios being swapped in,
> > or entire chunks if not the entire (original) mTHP needing to be swapped in.
> >
> > It should be noted that this is more of a performance vs. cold memory
> > preservation trade-off that needs to drive mTHP reclaim, storage, swapin
> and
> > writeback policy. Different workloads could require different policies.
> However,
> > even though this patch is only a starting point, it is still functionally correct
> > by being equivalent to zram-mTHP, and compatible with the rest of mm and
> > swap as far as mTHP. Another important functionality/data consistency
> decision
> > I made in this patch series is error handling during zswap_store() of mTHP:
> > in case of any errors, all swap offsets for the mTHP are deleted from the
> > zswap xarray/zpool, since we know that the mTHP will now have to be
> stored
> > in the backing swap device. IOW, an mTHP is either entirely stored in zswap,
> > or entirely not stored in zswap.
> >
> > To answer your question, we would need to come up with what the
> semantics
> > would need to be for zswap zpool storage granularity, swapin granularity,
> > readahead granularity and writeback wrt mTHP and how the overall swap
> > sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-
> order
> > folios during swapout. Once we have a good understanding of these policies,
> > we could implement them in zswap. Alternately, develop an abstraction that
> is
> > one level above zswap/zram and makes things easier and shareable
> between
> > zswap and zram. By this, I mean fundamental assumptions such as
> consecutive
> > swap offsets (for instance). To some extent, this implies that an mTHP as a
> > swap entity is defined by consecutiveness of swap offsets. Maybe the policy
> > to keep mTHPs in the system over extended duration might be to assemble
> > them dynamically based on swapin_readahead() decisions (which is based
> on
> > workload access patterns). In other words, mTHPs could be a useful
> abstraction
> > that can be static or even dynamic based on working set characteristics, and
> > cold memory preservation. This is quite a complex topic imho.
> >
> > As we know, Barry Song and Chuanhua Han have started the discussion on
> > this in their zram mTHP swapin series [1].
> 
> Yeah I'm a bit more concerned with the correctness aspect. As long as
> it's not buggy, then we can implement mTHP zswapout first, and force
> individual subpage (z)swapin for now (since we cannot control
> writeback from writing individual subpages).

Absolutely, this sounds like the way to go!

> 
> We can discuss strategy to harmonize mTHP, zswap (with writeback) as
> we go along.

Sounds great :)

> 
> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script
> not working properly... Let me manually add him in - please include
> him in future submission and responses, as he is also a zswap reviewer
> :)

I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include
Chengming in future submissions and responses :)

> 
> Also cc-ing Usama who is interested in this work.

Sounds great.

Thanks,
Kanchana

> 
> >
> > [1] https://lore.kernel.org/all/20240821074541.516249-3-
> hanchuanhua@xxxxxxxx/T/#u
> >
> > Thanks,
> > Kanchana




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux