Hi Chengming, > -----Original Message----- > From: Chengming Zhou <chengming.zhou@xxxxxxxxx> > Sent: Thursday, August 29, 2024 9:52 PM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx>; Nhat Pham > <nphamcs@xxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; ryan.roberts@xxxxxxx; > Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>; > Usama Arif <usamaarif642@xxxxxxxxx> > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > > On 2024/8/30 03:38, Sridhar, Kanchana P wrote: > > Hi Nhat, > > > >> -----Original Message----- > >> From: Nhat Pham <nphamcs@xxxxxxxxx> > >> Sent: Thursday, August 29, 2024 10:11 AM > >> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > >> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > >> hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; ryan.roberts@xxxxxxx; > >> Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > >> foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > >> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>; > >> Usama Arif <usamaarif642@xxxxxxxxx>; Chengming Zhou > >> <chengming.zhou@xxxxxxxxx> > >> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > >> > >> On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P > >> <kanchana.p.sridhar@xxxxxxxxx> wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Nhat Pham <nphamcs@xxxxxxxxx> > >>>> Sent: Wednesday, August 28, 2024 2:35 PM > >>>> To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > >>>> Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > >>>> hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; > >> ryan.roberts@xxxxxxx; > >>>> Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; > akpm@linux- > >>>> foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > >>>> <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > >>>> Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > >>>> > >>>> On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar > >>>> <kanchana.p.sridhar@xxxxxxxxx> wrote: > >>>>> > >>>>> Hi All, > >>>>> > >>>>> This patch-series enables zswap_store() to accept and store mTHP > >>>>> folios. The most significant contribution in this series is from the > >>>>> earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has > been > >>>>> migrated to v6.11-rc3 in patch 2/4 of this series. > >>>>> > >>>>> [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting > >>>>> https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > >>>> ryan.roberts@xxxxxxx/T/#u > >>>>> > >>>>> Additionally, there is an attempt to modularize some of the > functionality > >>>>> in zswap_store(), to make it more amenable to supporting any-order > >>>>> mTHPs. For instance, the function zswap_store_entry() stores a > >>>> zswap_entry > >>>>> in the xarray. Likewise, zswap_delete_stored_offsets() can be used to > >>>>> delete all offsets corresponding to a higher order folio stored in zswap. > >>>>> > >>>> > >>>> Will this have any conflict with mTHP swap work? Especially with mTHP > >>>> swap-in and zswap writeback. > >>>> > >>>> My understanding is from zswap's perspective, the large folio is > >>>> broken apart into independent subpages, correct? What happens when > >> we > >>>> have partially written back mTHP (i.e some subpages are in zswap > >>>> still, whereas others are written back to swap). Would this > >>>> automatically prevent mTHP swapin? > >>> > >>> That is a good point. To begin with, this patch-series would make the > default > >>> behavior for mTHP swapout/storage and swapin for ZSWAP to be on par > >> with > >>> ZRAM. From zswap's perspective, imo this is a significant step forward > >> towards > >>> realizing cold memory storage with mTHP folios. However, it is only a > >> starting > >>> point that makes the behavior uniform across zswap/zram. Initially, > >> workloads > >>> would see a one-time benefit with reclaim being able to swapout mTHP > >>> folios without splitting, to zswap. If the mTHPs were cold memory, then > we > >>> would have derived latency gains towards memory savings (with zswap). > >>> > >>> However, if the mTHP were part of "not so cold" memory, this would > result > >>> in a one-way mTHP conversion to 4K folios. Depending on workloads and > >> their > >>> access patterns, we could either see individual 4K folios being swapped in, > >>> or entire chunks if not the entire (original) mTHP needing to be swapped > in. > >>> > >>> It should be noted that this is more of a performance vs. cold memory > >>> preservation trade-off that needs to drive mTHP reclaim, storage, swapin > >> and > >>> writeback policy. Different workloads could require different policies. > >> However, > >>> even though this patch is only a starting point, it is still functionally > correct > >>> by being equivalent to zram-mTHP, and compatible with the rest of mm > and > >>> swap as far as mTHP. Another important functionality/data consistency > >> decision > >>> I made in this patch series is error handling during zswap_store() of > mTHP: > >>> in case of any errors, all swap offsets for the mTHP are deleted from the > >>> zswap xarray/zpool, since we know that the mTHP will now have to be > >> stored > >>> in the backing swap device. IOW, an mTHP is either entirely stored in > zswap, > >>> or entirely not stored in zswap. > >>> > >>> To answer your question, we would need to come up with what the > >> semantics > >>> would need to be for zswap zpool storage granularity, swapin granularity, > >>> readahead granularity and writeback wrt mTHP and how the overall > swap > >>> sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower- > >> order > >>> folios during swapout. Once we have a good understanding of these > policies, > >>> we could implement them in zswap. Alternately, develop an abstraction > that > >> is > >>> one level above zswap/zram and makes things easier and shareable > >> between > >>> zswap and zram. By this, I mean fundamental assumptions such as > >> consecutive > >>> swap offsets (for instance). To some extent, this implies that an mTHP as > a > >>> swap entity is defined by consecutiveness of swap offsets. Maybe the > policy > >>> to keep mTHPs in the system over extended duration might be to > assemble > >>> them dynamically based on swapin_readahead() decisions (which is > based > >> on > >>> workload access patterns). In other words, mTHPs could be a useful > >> abstraction > >>> that can be static or even dynamic based on working set characteristics, > and > >>> cold memory preservation. This is quite a complex topic imho. > >>> > >>> As we know, Barry Song and Chuanhua Han have started the discussion > on > >>> this in their zram mTHP swapin series [1]. > >> > >> Yeah I'm a bit more concerned with the correctness aspect. As long as > >> it's not buggy, then we can implement mTHP zswapout first, and force > >> individual subpage (z)swapin for now (since we cannot control > >> writeback from writing individual subpages). > > > > Absolutely, this sounds like the way to go! > > > >> > >> We can discuss strategy to harmonize mTHP, zswap (with writeback) as > >> we go along. > > > > Sounds great :) > > > >> > >> BTW, I think we're not cc-ing Chengming? Is the get_maintainers script > >> not working properly... Let me manually add him in - please include > >> him in future submission and responses, as he is also a zswap reviewer > >> :) > > > > I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include > > Chengming in future submissions and responses :) > > Maybe a little late for the party, will take a look ASAP. > It's an interesting and great work. Thanks! Appreciate your code review and suggestions to improve the patchset. Thanks, Kanchana > > Thanks! > > > > >> > >> Also cc-ing Usama who is interested in this work. > > > > Sounds great. > > > > Thanks, > > Kanchana > > > >> > >>> > >>> [1] https://lore.kernel.org/all/20240821074541.516249-3- > >> hanchuanhua@xxxxxxxx/T/#u > >>> > >>> Thanks, > >>> Kanchana