Re: [PATCH RFC 3/4] mm/vmstat: rename pgdemote_* to pgdemote_dst_* and add pgdemote_src_*

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Yasunori Gotou (Fujitsu)" <y-goto@xxxxxxxxxxx> writes:

> Hello,
>
>> On 02/11/2023 13:45, Huang, Ying wrote:
>> > Li Zhijian <lizhijian@xxxxxxxxxxx> writes:
>> >
>> >> pgdemote_src_*: pages demoted from this node.
>> >> pgdemote_dst_*: pages demoted to this node.
>> >>
>> >> So that we are able to know their demotion per-node stats by checking this.
>> >>
>> >> In the environment, node0 and node1 are DRAM, node3 is PMEM.
>> >>
>> >> Global stats:
>> >> $ grep -E 'demote' /proc/vmstat
>> >> pgdemote_src_kswapd 130155
>> >> pgdemote_src_direct 113497
>> >> pgdemote_src_khugepaged 0
>> >> pgdemote_dst_kswapd 130155
>> >> pgdemote_dst_direct 113497
>> >> pgdemote_dst_khugepaged 0
>> >>
>> >> Per-node stats:
>> >> $ grep demote /sys/devices/system/node/node0/vmstat
>> >> pgdemote_src_kswapd 68454
>> >> pgdemote_src_direct 83431
>> >> pgdemote_src_khugepaged 0
>> >> pgdemote_dst_kswapd 0
>> >> pgdemote_dst_direct 0
>> >> pgdemote_dst_khugepaged 0
>> >>
>> >> $ grep demote /sys/devices/system/node/node1/vmstat
>> >> pgdemote_src_kswapd 185834
>> >> pgdemote_src_direct 30066
>> >> pgdemote_src_khugepaged 0
>> >> pgdemote_dst_kswapd 0
>> >> pgdemote_dst_direct 0
>> >> pgdemote_dst_khugepaged 0
>> >>
>> >> $ grep demote /sys/devices/system/node/node3/vmstat
>> >> pgdemote_src_kswapd 0
>> >> pgdemote_src_direct 0
>> >> pgdemote_src_khugepaged 0
>> >> pgdemote_dst_kswapd 254288
>> >> pgdemote_dst_direct 113497
>> >> pgdemote_dst_khugepaged 0
>> >>
>> >>  From above stats, we know node3 is the demotion destination which one
>> >> the node0 and node1 will demote to.
>> >
>> > Why do we need these information?  Do you have some use case?
>> 
>> I recall our customers have mentioned that they want to know how much the
>> memory is demoted
>> to the CXL memory device in a specific period.
>
> I'll mention about it more.
>
> I had a conversation with one of our customers. He expressed a desire for more detailed
> profile information to analyze the behavior of demotion (and promotion) when
> his workloads are executed. 
> If the results are not satisfactory for his workloads, he wants to tune his servers for his workloads
> with these profiles.
> Additionally, depending on the results, he may want to change his server configuration. 
> For example, he may want to buy more expensive DDR memories rather than cheaper CXL memory.
>
> In my impression, our customers seems to think that CXL memory is NOT as reliable as DDR memory yet.
> Therefore, they want to prepare for the new world that CXL will bring, and want to have a method 
> for the preparation by profiling information as much as possible.
>
> it this enough for your question?

I want some more detailed information about how these stats are used?
Why isn't per-node pgdemote_xxx counter enough?

--
Best Regards,
Huang, Ying

> Thanks,
>
>> 
>> 
>> >>>   	mod_node_page_state(NODE_DATA(target_nid),
>> >>> -		    PGDEMOTE_KSWAPD + reclaimer_offset(),
>> nr_succeeded);
>> >>> +		    PGDEMOTE_DST_KSWAPD + reclaimer_offset(),
>> nr_succeeded);
>> 
>> But if the *target_nid* is only indicate the preferred node, this accounting
>> maybe not accurate.
>> 
>> 
>> Thanks
>> Zhijian
>> 
>> >
>> > --
>> > Best Regards,
>> > Huang, Ying
>> >
>> >> Signed-off-by: Li Zhijian <lizhijian@xxxxxxxxxxx>
>> >> ---
>> >> RFC: their names are open to discussion, maybe pgdemote_from/to_*
>> >> Another defect of this patch is that, SUM(pgdemote_src_*) is always same
>> >> as SUM(pgdemote_dst_*) in the global stats, shall we hide one of them.
>> >> ---
>> >>   include/linux/mmzone.h |  9 ++++++---
>> >>   mm/vmscan.c            | 13 ++++++++++---
>> >>   mm/vmstat.c            |  9 ++++++---
>> >>   3 files changed, 22 insertions(+), 9 deletions(-)
>> >>
>> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> >> index ad0309eea850..a6140d894bec 100644
>> >> --- a/include/linux/mmzone.h
>> >> +++ b/include/linux/mmzone.h
>> >> @@ -207,9 +207,12 @@ enum node_stat_item {
>> >>   	PGPROMOTE_SUCCESS,	/* promote successfully */
>> >>   	PGPROMOTE_CANDIDATE,	/* candidate pages to promote */
>> >>   	/* PGDEMOTE_*: pages demoted */
>> >> -	PGDEMOTE_KSWAPD,
>> >> -	PGDEMOTE_DIRECT,
>> >> -	PGDEMOTE_KHUGEPAGED,
>> >> +	PGDEMOTE_SRC_KSWAPD,
>> >> +	PGDEMOTE_SRC_DIRECT,
>> >> +	PGDEMOTE_SRC_KHUGEPAGED,
>> >> +	PGDEMOTE_DST_KSWAPD,
>> >> +	PGDEMOTE_DST_DIRECT,
>> >> +	PGDEMOTE_DST_KHUGEPAGED,
>> >>   #endif
>> >>   	NR_VM_NODE_STAT_ITEMS
>> >>   };
>> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> >> index 2f1fb4ec3235..55d2287d7150 100644
>> >> --- a/mm/vmscan.c
>> >> +++ b/mm/vmscan.c
>> >> @@ -1111,13 +1111,18 @@ void drop_slab(void)
>> >>   static int reclaimer_offset(void)
>> >>   {
>> >>   	BUILD_BUG_ON(PGSTEAL_DIRECT - PGSTEAL_KSWAPD !=
>> >> -			PGDEMOTE_DIRECT - PGDEMOTE_KSWAPD);
>> >> +			PGDEMOTE_SRC_DIRECT -
>> PGDEMOTE_SRC_KSWAPD);
>> >>   	BUILD_BUG_ON(PGSTEAL_DIRECT - PGSTEAL_KSWAPD !=
>> >>   			PGSCAN_DIRECT - PGSCAN_KSWAPD);
>> >>   	BUILD_BUG_ON(PGSTEAL_KHUGEPAGED - PGSTEAL_KSWAPD !=
>> >> -			PGDEMOTE_KHUGEPAGED -
>> PGDEMOTE_KSWAPD);
>> >> +			PGDEMOTE_SRC_KHUGEPAGED -
>> PGDEMOTE_SRC_KSWAPD);
>> >>   	BUILD_BUG_ON(PGSTEAL_KHUGEPAGED - PGSTEAL_KSWAPD !=
>> >>   			PGSCAN_KHUGEPAGED - PGSCAN_KSWAPD);
>> >> +	BUILD_BUG_ON(PGDEMOTE_SRC_DIRECT -
>> PGDEMOTE_SRC_KSWAPD !=
>> >> +			PGDEMOTE_DST_DIRECT -
>> PGDEMOTE_DST_KSWAPD);
>> >> +	BUILD_BUG_ON(PGDEMOTE_SRC_KHUGEPAGED -
>> PGDEMOTE_SRC_KSWAPD !=
>> >> +			PGDEMOTE_DST_KHUGEPAGED -
>> PGDEMOTE_DST_KSWAPD);
>> >> +
>> >>
>> >>   	if (current_is_kswapd())
>> >>   		return 0;
>> >> @@ -1678,8 +1683,10 @@ static unsigned int demote_folio_list(struct
>> list_head *demote_folios,
>> >>   		      (unsigned long)&mtc, MIGRATE_ASYNC,
>> MR_DEMOTION,
>> >>   		      &nr_succeeded);
>> >>
>> >> +	mod_node_page_state(pgdat,
>> >> +		    PGDEMOTE_SRC_KSWAPD + reclaimer_offset(),
>> nr_succeeded);
>> >>   	mod_node_page_state(NODE_DATA(target_nid),
>> >> -		    PGDEMOTE_KSWAPD + reclaimer_offset(),
>> nr_succeeded);
>> >> +		    PGDEMOTE_DST_KSWAPD + reclaimer_offset(),
>> nr_succeeded);
>> >>
>> >>   	return nr_succeeded;
>> >>   }
>> >> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> >> index f141c48c39e4..63f106a5e008 100644
>> >> --- a/mm/vmstat.c
>> >> +++ b/mm/vmstat.c
>> >> @@ -1244,9 +1244,12 @@ const char * const vmstat_text[] = {
>> >>   #ifdef CONFIG_NUMA_BALANCING
>> >>   	"pgpromote_success",
>> >>   	"pgpromote_candidate",
>> >> -	"pgdemote_kswapd",
>> >> -	"pgdemote_direct",
>> >> -	"pgdemote_khugepaged",
>> >> +	"pgdemote_src_kswapd",
>> >> +	"pgdemote_src_direct",
>> >> +	"pgdemote_src_khugepaged",
>> >> +	"pgdemote_dst_kswapd",
>> >> +	"pgdemote_dst_direct",
>> >> +	"pgdemote_dst_khugepaged",
>> >>   #endif
>> >>
>> >>   	/* enum writeback_stat_item counters */




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux