"Yasunori Gotou (Fujitsu)" <y-goto@xxxxxxxxxxx> writes: > Hello, > >> On 02/11/2023 13:45, Huang, Ying wrote: >> > Li Zhijian <lizhijian@xxxxxxxxxxx> writes: >> > >> >> pgdemote_src_*: pages demoted from this node. >> >> pgdemote_dst_*: pages demoted to this node. >> >> >> >> So that we are able to know their demotion per-node stats by checking this. >> >> >> >> In the environment, node0 and node1 are DRAM, node3 is PMEM. >> >> >> >> Global stats: >> >> $ grep -E 'demote' /proc/vmstat >> >> pgdemote_src_kswapd 130155 >> >> pgdemote_src_direct 113497 >> >> pgdemote_src_khugepaged 0 >> >> pgdemote_dst_kswapd 130155 >> >> pgdemote_dst_direct 113497 >> >> pgdemote_dst_khugepaged 0 >> >> >> >> Per-node stats: >> >> $ grep demote /sys/devices/system/node/node0/vmstat >> >> pgdemote_src_kswapd 68454 >> >> pgdemote_src_direct 83431 >> >> pgdemote_src_khugepaged 0 >> >> pgdemote_dst_kswapd 0 >> >> pgdemote_dst_direct 0 >> >> pgdemote_dst_khugepaged 0 >> >> >> >> $ grep demote /sys/devices/system/node/node1/vmstat >> >> pgdemote_src_kswapd 185834 >> >> pgdemote_src_direct 30066 >> >> pgdemote_src_khugepaged 0 >> >> pgdemote_dst_kswapd 0 >> >> pgdemote_dst_direct 0 >> >> pgdemote_dst_khugepaged 0 >> >> >> >> $ grep demote /sys/devices/system/node/node3/vmstat >> >> pgdemote_src_kswapd 0 >> >> pgdemote_src_direct 0 >> >> pgdemote_src_khugepaged 0 >> >> pgdemote_dst_kswapd 254288 >> >> pgdemote_dst_direct 113497 >> >> pgdemote_dst_khugepaged 0 >> >> >> >> From above stats, we know node3 is the demotion destination which one >> >> the node0 and node1 will demote to. >> > >> > Why do we need these information? Do you have some use case? >> >> I recall our customers have mentioned that they want to know how much the >> memory is demoted >> to the CXL memory device in a specific period. > > I'll mention about it more. > > I had a conversation with one of our customers. He expressed a desire for more detailed > profile information to analyze the behavior of demotion (and promotion) when > his workloads are executed. > If the results are not satisfactory for his workloads, he wants to tune his servers for his workloads > with these profiles. > Additionally, depending on the results, he may want to change his server configuration. > For example, he may want to buy more expensive DDR memories rather than cheaper CXL memory. > > In my impression, our customers seems to think that CXL memory is NOT as reliable as DDR memory yet. > Therefore, they want to prepare for the new world that CXL will bring, and want to have a method > for the preparation by profiling information as much as possible. > > it this enough for your question? I want some more detailed information about how these stats are used? Why isn't per-node pgdemote_xxx counter enough? -- Best Regards, Huang, Ying > Thanks, > >> >> >> >>> mod_node_page_state(NODE_DATA(target_nid), >> >>> - PGDEMOTE_KSWAPD + reclaimer_offset(), >> nr_succeeded); >> >>> + PGDEMOTE_DST_KSWAPD + reclaimer_offset(), >> nr_succeeded); >> >> But if the *target_nid* is only indicate the preferred node, this accounting >> maybe not accurate. >> >> >> Thanks >> Zhijian >> >> > >> > -- >> > Best Regards, >> > Huang, Ying >> > >> >> Signed-off-by: Li Zhijian <lizhijian@xxxxxxxxxxx> >> >> --- >> >> RFC: their names are open to discussion, maybe pgdemote_from/to_* >> >> Another defect of this patch is that, SUM(pgdemote_src_*) is always same >> >> as SUM(pgdemote_dst_*) in the global stats, shall we hide one of them. >> >> --- >> >> include/linux/mmzone.h | 9 ++++++--- >> >> mm/vmscan.c | 13 ++++++++++--- >> >> mm/vmstat.c | 9 ++++++--- >> >> 3 files changed, 22 insertions(+), 9 deletions(-) >> >> >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> >> index ad0309eea850..a6140d894bec 100644 >> >> --- a/include/linux/mmzone.h >> >> +++ b/include/linux/mmzone.h >> >> @@ -207,9 +207,12 @@ enum node_stat_item { >> >> PGPROMOTE_SUCCESS, /* promote successfully */ >> >> PGPROMOTE_CANDIDATE, /* candidate pages to promote */ >> >> /* PGDEMOTE_*: pages demoted */ >> >> - PGDEMOTE_KSWAPD, >> >> - PGDEMOTE_DIRECT, >> >> - PGDEMOTE_KHUGEPAGED, >> >> + PGDEMOTE_SRC_KSWAPD, >> >> + PGDEMOTE_SRC_DIRECT, >> >> + PGDEMOTE_SRC_KHUGEPAGED, >> >> + PGDEMOTE_DST_KSWAPD, >> >> + PGDEMOTE_DST_DIRECT, >> >> + PGDEMOTE_DST_KHUGEPAGED, >> >> #endif >> >> NR_VM_NODE_STAT_ITEMS >> >> }; >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> >> index 2f1fb4ec3235..55d2287d7150 100644 >> >> --- a/mm/vmscan.c >> >> +++ b/mm/vmscan.c >> >> @@ -1111,13 +1111,18 @@ void drop_slab(void) >> >> static int reclaimer_offset(void) >> >> { >> >> BUILD_BUG_ON(PGSTEAL_DIRECT - PGSTEAL_KSWAPD != >> >> - PGDEMOTE_DIRECT - PGDEMOTE_KSWAPD); >> >> + PGDEMOTE_SRC_DIRECT - >> PGDEMOTE_SRC_KSWAPD); >> >> BUILD_BUG_ON(PGSTEAL_DIRECT - PGSTEAL_KSWAPD != >> >> PGSCAN_DIRECT - PGSCAN_KSWAPD); >> >> BUILD_BUG_ON(PGSTEAL_KHUGEPAGED - PGSTEAL_KSWAPD != >> >> - PGDEMOTE_KHUGEPAGED - >> PGDEMOTE_KSWAPD); >> >> + PGDEMOTE_SRC_KHUGEPAGED - >> PGDEMOTE_SRC_KSWAPD); >> >> BUILD_BUG_ON(PGSTEAL_KHUGEPAGED - PGSTEAL_KSWAPD != >> >> PGSCAN_KHUGEPAGED - PGSCAN_KSWAPD); >> >> + BUILD_BUG_ON(PGDEMOTE_SRC_DIRECT - >> PGDEMOTE_SRC_KSWAPD != >> >> + PGDEMOTE_DST_DIRECT - >> PGDEMOTE_DST_KSWAPD); >> >> + BUILD_BUG_ON(PGDEMOTE_SRC_KHUGEPAGED - >> PGDEMOTE_SRC_KSWAPD != >> >> + PGDEMOTE_DST_KHUGEPAGED - >> PGDEMOTE_DST_KSWAPD); >> >> + >> >> >> >> if (current_is_kswapd()) >> >> return 0; >> >> @@ -1678,8 +1683,10 @@ static unsigned int demote_folio_list(struct >> list_head *demote_folios, >> >> (unsigned long)&mtc, MIGRATE_ASYNC, >> MR_DEMOTION, >> >> &nr_succeeded); >> >> >> >> + mod_node_page_state(pgdat, >> >> + PGDEMOTE_SRC_KSWAPD + reclaimer_offset(), >> nr_succeeded); >> >> mod_node_page_state(NODE_DATA(target_nid), >> >> - PGDEMOTE_KSWAPD + reclaimer_offset(), >> nr_succeeded); >> >> + PGDEMOTE_DST_KSWAPD + reclaimer_offset(), >> nr_succeeded); >> >> >> >> return nr_succeeded; >> >> } >> >> diff --git a/mm/vmstat.c b/mm/vmstat.c >> >> index f141c48c39e4..63f106a5e008 100644 >> >> --- a/mm/vmstat.c >> >> +++ b/mm/vmstat.c >> >> @@ -1244,9 +1244,12 @@ const char * const vmstat_text[] = { >> >> #ifdef CONFIG_NUMA_BALANCING >> >> "pgpromote_success", >> >> "pgpromote_candidate", >> >> - "pgdemote_kswapd", >> >> - "pgdemote_direct", >> >> - "pgdemote_khugepaged", >> >> + "pgdemote_src_kswapd", >> >> + "pgdemote_src_direct", >> >> + "pgdemote_src_khugepaged", >> >> + "pgdemote_dst_kswapd", >> >> + "pgdemote_dst_direct", >> >> + "pgdemote_dst_khugepaged", >> >> #endif >> >> >> >> /* enum writeback_stat_item counters */