RE: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chengming,

> -----Original Message-----
> From: Chengming Zhou <chengming.zhou@xxxxxxxxx>
> Sent: Thursday, August 29, 2024 9:52 PM
> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; Nhat Pham
> <nphamcs@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; ryan.roberts@xxxxxxx;
> Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux-
> foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>;
> Usama Arif <usamaarif642@xxxxxxxxx>
> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> 
> On 2024/8/30 03:38, Sridhar, Kanchana P wrote:
> > Hi Nhat,
> >
> >> -----Original Message-----
> >> From: Nhat Pham <nphamcs@xxxxxxxxx>
> >> Sent: Thursday, August 29, 2024 10:11 AM
> >> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> >> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> >> hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; ryan.roberts@xxxxxxx;
> >> Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux-
> >> foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> >> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>;
> >> Usama Arif <usamaarif642@xxxxxxxxx>; Chengming Zhou
> >> <chengming.zhou@xxxxxxxxx>
> >> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> >>
> >> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P
> >> <kanchana.p.sridhar@xxxxxxxxx> wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Nhat Pham <nphamcs@xxxxxxxxx>
> >>>> Sent: Wednesday, August 28, 2024 2:35 PM
> >>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>
> >>>> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> >>>> hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx;
> >> ryan.roberts@xxxxxxx;
> >>>> Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx;
> akpm@linux-
> >>>> foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K
> >>>> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>
> >>>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios
> >>>>
> >>>> On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar
> >>>> <kanchana.p.sridhar@xxxxxxxxx> wrote:
> >>>>>
> >>>>> Hi All,
> >>>>>
> >>>>> This patch-series enables zswap_store() to accept and store mTHP
> >>>>> folios. The most significant contribution in this series is from the
> >>>>> earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has
> been
> >>>>> migrated to v6.11-rc3 in patch 2/4 of this series.
> >>>>>
> >>>>> [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting
> >>>>>       https://lore.kernel.org/linux-mm/20231019110543.3284654-1-
> >>>> ryan.roberts@xxxxxxx/T/#u
> >>>>>
> >>>>> Additionally, there is an attempt to modularize some of the
> functionality
> >>>>> in zswap_store(), to make it more amenable to supporting any-order
> >>>>> mTHPs. For instance, the function zswap_store_entry() stores a
> >>>> zswap_entry
> >>>>> in the xarray. Likewise, zswap_delete_stored_offsets() can be used to
> >>>>> delete all offsets corresponding to a higher order folio stored in zswap.
> >>>>>
> >>>>
> >>>> Will this have any conflict with mTHP swap work? Especially with mTHP
> >>>> swap-in and zswap writeback.
> >>>>
> >>>> My understanding is from zswap's perspective, the large folio is
> >>>> broken apart into independent subpages, correct? What happens when
> >> we
> >>>> have partially written back mTHP (i.e some subpages are in zswap
> >>>> still, whereas others are written back to swap). Would this
> >>>> automatically prevent mTHP swapin?
> >>>
> >>> That is a good point. To begin with, this patch-series would make the
> default
> >>> behavior for mTHP swapout/storage and swapin for ZSWAP to be on par
> >> with
> >>> ZRAM. From zswap's perspective, imo this is a significant step forward
> >> towards
> >>> realizing cold memory storage with mTHP folios. However, it is only a
> >> starting
> >>> point that makes the behavior uniform across zswap/zram. Initially,
> >> workloads
> >>> would see a one-time benefit with reclaim being able to swapout mTHP
> >>> folios without splitting, to zswap. If the mTHPs were cold memory, then
> we
> >>> would have derived latency gains towards memory savings (with zswap).
> >>>
> >>> However, if the mTHP were part of "not so cold" memory, this would
> result
> >>> in a one-way mTHP conversion to 4K folios. Depending on workloads and
> >> their
> >>> access patterns, we could either see individual 4K folios being swapped in,
> >>> or entire chunks if not the entire (original) mTHP needing to be swapped
> in.
> >>>
> >>> It should be noted that this is more of a performance vs. cold memory
> >>> preservation trade-off that needs to drive mTHP reclaim, storage, swapin
> >> and
> >>> writeback policy. Different workloads could require different policies.
> >> However,
> >>> even though this patch is only a starting point, it is still functionally
> correct
> >>> by being equivalent to zram-mTHP, and compatible with the rest of mm
> and
> >>> swap as far as mTHP. Another important functionality/data consistency
> >> decision
> >>> I made in this patch series is error handling during zswap_store() of
> mTHP:
> >>> in case of any errors, all swap offsets for the mTHP are deleted from the
> >>> zswap xarray/zpool, since we know that the mTHP will now have to be
> >> stored
> >>> in the backing swap device. IOW, an mTHP is either entirely stored in
> zswap,
> >>> or entirely not stored in zswap.
> >>>
> >>> To answer your question, we would need to come up with what the
> >> semantics
> >>> would need to be for zswap zpool storage granularity, swapin granularity,
> >>> readahead granularity and writeback wrt mTHP and how the overall
> swap
> >>> sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower-
> >> order
> >>> folios during swapout. Once we have a good understanding of these
> policies,
> >>> we could implement them in zswap. Alternately, develop an abstraction
> that
> >> is
> >>> one level above zswap/zram and makes things easier and shareable
> >> between
> >>> zswap and zram. By this, I mean fundamental assumptions such as
> >> consecutive
> >>> swap offsets (for instance). To some extent, this implies that an mTHP as
> a
> >>> swap entity is defined by consecutiveness of swap offsets. Maybe the
> policy
> >>> to keep mTHPs in the system over extended duration might be to
> assemble
> >>> them dynamically based on swapin_readahead() decisions (which is
> based
> >> on
> >>> workload access patterns). In other words, mTHPs could be a useful
> >> abstraction
> >>> that can be static or even dynamic based on working set characteristics,
> and
> >>> cold memory preservation. This is quite a complex topic imho.
> >>>
> >>> As we know, Barry Song and Chuanhua Han have started the discussion
> on
> >>> this in their zram mTHP swapin series [1].
> >>
> >> Yeah I'm a bit more concerned with the correctness aspect. As long as
> >> it's not buggy, then we can implement mTHP zswapout first, and force
> >> individual subpage (z)swapin for now (since we cannot control
> >> writeback from writing individual subpages).
> >
> > Absolutely, this sounds like the way to go!
> >
> >>
> >> We can discuss strategy to harmonize mTHP, zswap (with writeback) as
> >> we go along.
> >
> > Sounds great :)
> >
> >>
> >> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script
> >> not working properly... Let me manually add him in - please include
> >> him in future submission and responses, as he is also a zswap reviewer
> >> :)
> >
> > I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include
> > Chengming in future submissions and responses :)
> 
> Maybe a little late for the party, will take a look ASAP.
> It's an interesting and great work.

Thanks! Appreciate your code review and suggestions to improve
the patchset.

Thanks,
Kanchana

> 
> Thanks!
> 
> >
> >>
> >> Also cc-ing Usama who is interested in this work.
> >
> > Sounds great.
> >
> > Thanks,
> > Kanchana
> >
> >>
> >>>
> >>> [1] https://lore.kernel.org/all/20240821074541.516249-3-
> >> hanchuanhua@xxxxxxxx/T/#u
> >>>
> >>> Thanks,
> >>> Kanchana




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux