Hi Nhat, > -----Original Message----- > From: Nhat Pham <nphamcs@xxxxxxxxx> > Sent: Thursday, August 29, 2024 10:11 AM > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; ryan.roberts@xxxxxxx; > Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx>; > Usama Arif <usamaarif642@xxxxxxxxx>; Chengming Zhou > <chengming.zhou@xxxxxxxxx> > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > > On Wed, Aug 28, 2024 at 5:06 PM Sridhar, Kanchana P > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > -----Original Message----- > > > From: Nhat Pham <nphamcs@xxxxxxxxx> > > > Sent: Wednesday, August 28, 2024 2:35 PM > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > > > hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; > ryan.roberts@xxxxxxx; > > > Huang, Ying <ying.huang@xxxxxxxxx>; 21cnbao@xxxxxxxxx; akpm@linux- > > > foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > > > Subject: Re: [PATCH v5 0/3] mm: ZSWAP swap-out of mTHP folios > > > > > > On Wed, Aug 28, 2024 at 2:35 AM Kanchana P Sridhar > > > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > > > Hi All, > > > > > > > > This patch-series enables zswap_store() to accept and store mTHP > > > > folios. The most significant contribution in this series is from the > > > > earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been > > > > migrated to v6.11-rc3 in patch 2/4 of this series. > > > > > > > > [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting > > > > https://lore.kernel.org/linux-mm/20231019110543.3284654-1- > > > ryan.roberts@xxxxxxx/T/#u > > > > > > > > Additionally, there is an attempt to modularize some of the functionality > > > > in zswap_store(), to make it more amenable to supporting any-order > > > > mTHPs. For instance, the function zswap_store_entry() stores a > > > zswap_entry > > > > in the xarray. Likewise, zswap_delete_stored_offsets() can be used to > > > > delete all offsets corresponding to a higher order folio stored in zswap. > > > > > > > > > > Will this have any conflict with mTHP swap work? Especially with mTHP > > > swap-in and zswap writeback. > > > > > > My understanding is from zswap's perspective, the large folio is > > > broken apart into independent subpages, correct? What happens when > we > > > have partially written back mTHP (i.e some subpages are in zswap > > > still, whereas others are written back to swap). Would this > > > automatically prevent mTHP swapin? > > > > That is a good point. To begin with, this patch-series would make the default > > behavior for mTHP swapout/storage and swapin for ZSWAP to be on par > with > > ZRAM. From zswap's perspective, imo this is a significant step forward > towards > > realizing cold memory storage with mTHP folios. However, it is only a > starting > > point that makes the behavior uniform across zswap/zram. Initially, > workloads > > would see a one-time benefit with reclaim being able to swapout mTHP > > folios without splitting, to zswap. If the mTHPs were cold memory, then we > > would have derived latency gains towards memory savings (with zswap). > > > > However, if the mTHP were part of "not so cold" memory, this would result > > in a one-way mTHP conversion to 4K folios. Depending on workloads and > their > > access patterns, we could either see individual 4K folios being swapped in, > > or entire chunks if not the entire (original) mTHP needing to be swapped in. > > > > It should be noted that this is more of a performance vs. cold memory > > preservation trade-off that needs to drive mTHP reclaim, storage, swapin > and > > writeback policy. Different workloads could require different policies. > However, > > even though this patch is only a starting point, it is still functionally correct > > by being equivalent to zram-mTHP, and compatible with the rest of mm and > > swap as far as mTHP. Another important functionality/data consistency > decision > > I made in this patch series is error handling during zswap_store() of mTHP: > > in case of any errors, all swap offsets for the mTHP are deleted from the > > zswap xarray/zpool, since we know that the mTHP will now have to be > stored > > in the backing swap device. IOW, an mTHP is either entirely stored in zswap, > > or entirely not stored in zswap. > > > > To answer your question, we would need to come up with what the > semantics > > would need to be for zswap zpool storage granularity, swapin granularity, > > readahead granularity and writeback wrt mTHP and how the overall swap > > sub-system needs to "preserve" mTHP vs. splitting mTHP into 4K/lower- > order > > folios during swapout. Once we have a good understanding of these policies, > > we could implement them in zswap. Alternately, develop an abstraction that > is > > one level above zswap/zram and makes things easier and shareable > between > > zswap and zram. By this, I mean fundamental assumptions such as > consecutive > > swap offsets (for instance). To some extent, this implies that an mTHP as a > > swap entity is defined by consecutiveness of swap offsets. Maybe the policy > > to keep mTHPs in the system over extended duration might be to assemble > > them dynamically based on swapin_readahead() decisions (which is based > on > > workload access patterns). In other words, mTHPs could be a useful > abstraction > > that can be static or even dynamic based on working set characteristics, and > > cold memory preservation. This is quite a complex topic imho. > > > > As we know, Barry Song and Chuanhua Han have started the discussion on > > this in their zram mTHP swapin series [1]. > > Yeah I'm a bit more concerned with the correctness aspect. As long as > it's not buggy, then we can implement mTHP zswapout first, and force > individual subpage (z)swapin for now (since we cannot control > writeback from writing individual subpages). Absolutely, this sounds like the way to go! > > We can discuss strategy to harmonize mTHP, zswap (with writeback) as > we go along. Sounds great :) > > BTW, I think we're not cc-ing Chengming? Is the get_maintainers script > not working properly... Let me manually add him in - please include > him in future submission and responses, as he is also a zswap reviewer > :) I think when I ran get_maintainers.pl, I was in v6.10. For sure, will include Chengming in future submissions and responses :) > > Also cc-ing Usama who is interested in this work. Sounds great. Thanks, Kanchana > > > > > [1] https://lore.kernel.org/all/20240821074541.516249-3- > hanchuanhua@xxxxxxxx/T/#u > > > > Thanks, > > Kanchana