On Thu, Aug 15, 2024 at 5:40 AM Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> wrote: > > Hi Barry, > > > -----Original Message----- > > From: Barry Song <21cnbao@xxxxxxxxx> > > Sent: Wednesday, August 14, 2024 12:49 AM > > To: Sridhar, Kanchana P <kanchana.p.sridhar@xxxxxxxxx> > > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; > > hannes@xxxxxxxxxxx; yosryahmed@xxxxxxxxxx; nphamcs@xxxxxxxxx; > > ryan.roberts@xxxxxxx; Huang, Ying <ying.huang@xxxxxxxxx>; akpm@linux- > > foundation.org; Zou, Nanhai <nanhai.zou@xxxxxxxxx>; Feghali, Wajdi K > > <wajdi.k.feghali@xxxxxxxxx>; Gopal, Vinodh <vinodh.gopal@xxxxxxxxx> > > Subject: Re: [RFC PATCH v1 2/4] mm: vmstat: Per mTHP-size zswap_store > > vmstat event counters. > > > > On Wed, Aug 14, 2024 at 6:28 PM Kanchana P Sridhar > > <kanchana.p.sridhar@xxxxxxxxx> wrote: > > > > > > Added vmstat event counters per mTHP-size that can be used to account > > > for folios of different sizes being successfully stored in ZSWAP. > > > > > > For this RFC, it is not clear if these zswpout counters should instead > > > be added as part of the existing mTHP stats in > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats. > > > > > > The following is also a viable option, should it make better sense, > > > for instance, as: > > > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout. > > > > > > If so, we would be able to distinguish between mTHP zswap and > > > non-zswap swapouts through: > > > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout > > > > > > and > > > > > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout > > > > > > respectively. > > > > > > Comments would be appreciated as to which approach is preferable. > > > > Even though swapout might go through zswap, from the perspective of > > the mm core, it shouldn't be aware of that. Shouldn't zswpout be part > > of swpout? Why are they separate? no matter if a mTHP has been > > put in zswap, it has been swapped-out to mm-core? No? > > Thanks for the code review comments. This is a good point. I was keeping in > mind the convention used by existing vmstat event counters that distinguish > zswpout/zswpin from pswpout/pswpin events. > > If we want to keep the distinction in mTHP swapouts, would adding a > separate MTHP_STAT_ZSWPOUT to "enum mthp_stat_item" be Ok? > I'm not entirely sure how important the zswpout counter is. To me, it doesn't seem as critical as swpout and swpout_fallback, which are more useful for system profiling. zswapout feels more like an internal detail related to how the swap-out process is handled? If this is the case, we might not need this per-size counter. Otherwise, I believe sysfs is a better place to avoid all the chaos in vmstat to handle various orders and sizes. So the question is, per-size zswpout counter is really important or just for debugging purposes? > In any case, it looks like all that would be needed is a call to > count_mthp_stat(folio_order(folio), MTHP_STAT_[Z]SWPOUT) in the > general case. > > I will make this change in v2, depending on whether or not the > separation of zswpout vs. non-zswap swpout is recommended for > mTHP. > > > > > > > > > > > Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@xxxxxxxxx> > > > --- > > > include/linux/vm_event_item.h | 15 +++++++++++++++ > > > mm/vmstat.c | 15 +++++++++++++++ > > > 2 files changed, 30 insertions(+) > > > > > > diff --git a/include/linux/vm_event_item.h > > b/include/linux/vm_event_item.h > > > index 747943bc8cc2..2451bcfcf05c 100644 > > > --- a/include/linux/vm_event_item.h > > > +++ b/include/linux/vm_event_item.h > > > @@ -114,6 +114,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, > > PSWPIN, PSWPOUT, > > > THP_ZERO_PAGE_ALLOC, > > > THP_ZERO_PAGE_ALLOC_FAILED, > > > THP_SWPOUT, > > > +#ifdef CONFIG_ZSWAP > > > + ZSWPOUT_PMD_THP_FOLIO, > > > +#endif > > > THP_SWPOUT_FALLBACK, > > > #endif > > > #ifdef CONFIG_MEMORY_BALLOON > > > @@ -143,6 +146,18 @@ enum vm_event_item { PGPGIN, PGPGOUT, > > PSWPIN, PSWPOUT, > > > ZSWPIN, > > > ZSWPOUT, > > > ZSWPWB, > > > + ZSWPOUT_4KB_FOLIO, > > > +#ifdef CONFIG_THP_SWAP > > > + mTHP_ZSWPOUT_8kB, > > > + mTHP_ZSWPOUT_16kB, > > > + mTHP_ZSWPOUT_32kB, > > > + mTHP_ZSWPOUT_64kB, > > > + mTHP_ZSWPOUT_128kB, > > > + mTHP_ZSWPOUT_256kB, > > > + mTHP_ZSWPOUT_512kB, > > > + mTHP_ZSWPOUT_1024kB, > > > + mTHP_ZSWPOUT_2048kB, > > > +#endif > > > > This implementation hardcodes assumptions about the page size being 4KB, > > but page sizes can vary, and so can the THP orders? > > Agreed, will address in v2. > > > > > > #endif > > > #ifdef CONFIG_X86 > > > DIRECT_MAP_LEVEL2_SPLIT, > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > index 8507c497218b..0e66c8b0c486 100644 > > > --- a/mm/vmstat.c > > > +++ b/mm/vmstat.c > > > @@ -1375,6 +1375,9 @@ const char * const vmstat_text[] = { > > > "thp_zero_page_alloc", > > > "thp_zero_page_alloc_failed", > > > "thp_swpout", > > > +#ifdef CONFIG_ZSWAP > > > + "zswpout_pmd_thp_folio", > > > +#endif > > > "thp_swpout_fallback", > > > #endif > > > #ifdef CONFIG_MEMORY_BALLOON > > > @@ -1405,6 +1408,18 @@ const char * const vmstat_text[] = { > > > "zswpin", > > > "zswpout", > > > "zswpwb", > > > + "zswpout_4kb_folio", > > > +#ifdef CONFIG_THP_SWAP > > > + "mthp_zswpout_8kb", > > > + "mthp_zswpout_16kb", > > > + "mthp_zswpout_32kb", > > > + "mthp_zswpout_64kb", > > > + "mthp_zswpout_128kb", > > > + "mthp_zswpout_256kb", > > > + "mthp_zswpout_512kb", > > > + "mthp_zswpout_1024kb", > > > + "mthp_zswpout_2048kb", > > > +#endif > > > > The issue here is that the number of THP orders > > can vary across different platforms. > > Agreed, will address in v2. > > Thanks, > Kanchana > > > > > > #endif > > > #ifdef CONFIG_X86 > > > "direct_map_level2_splits", > > > -- > > > 2.27.0 > > > Thanks Barry