> >> > Hello, > >> > > >> >> On 02/11/2023 13:45, Huang, Ying wrote: > >> >> > Li Zhijian <lizhijian@xxxxxxxxxxx> writes: > >> >> > > >> >> >> pgdemote_src_*: pages demoted from this node. > >> >> >> pgdemote_dst_*: pages demoted to this node. > >> >> >> > >> >> >> So that we are able to know their demotion per-node stats by > >> >> >> checking > >> this. > >> >> >> > >> >> >> In the environment, node0 and node1 are DRAM, node3 is PMEM. > >> >> >> > >> >> >> Global stats: > >> >> >> $ grep -E 'demote' /proc/vmstat pgdemote_src_kswapd 130155 > >> >> >> pgdemote_src_direct 113497 pgdemote_src_khugepaged 0 > >> >> >> pgdemote_dst_kswapd 130155 pgdemote_dst_direct 113497 > >> >> >> pgdemote_dst_khugepaged 0 > >> >> >> > >> >> >> Per-node stats: > >> >> >> $ grep demote /sys/devices/system/node/node0/vmstat > >> >> >> pgdemote_src_kswapd 68454 > >> >> >> pgdemote_src_direct 83431 > >> >> >> pgdemote_src_khugepaged 0 > >> >> >> pgdemote_dst_kswapd 0 > >> >> >> pgdemote_dst_direct 0 > >> >> >> pgdemote_dst_khugepaged 0 > >> >> >> > >> >> >> $ grep demote /sys/devices/system/node/node1/vmstat > >> >> >> pgdemote_src_kswapd 185834 > >> >> >> pgdemote_src_direct 30066 > >> >> >> pgdemote_src_khugepaged 0 > >> >> >> pgdemote_dst_kswapd 0 > >> >> >> pgdemote_dst_direct 0 > >> >> >> pgdemote_dst_khugepaged 0 > >> >> >> > >> >> >> $ grep demote /sys/devices/system/node/node3/vmstat > >> >> >> pgdemote_src_kswapd 0 > >> >> >> pgdemote_src_direct 0 > >> >> >> pgdemote_src_khugepaged 0 > >> >> >> pgdemote_dst_kswapd 254288 > >> >> >> pgdemote_dst_direct 113497 > >> >> >> pgdemote_dst_khugepaged 0 > >> >> >> > >> >> >> From above stats, we know node3 is the demotion destination > >> >> >> which one the node0 and node1 will demote to. > >> >> > > >> >> > Why do we need these information? Do you have some use case? > >> >> > >> >> I recall our customers have mentioned that they want to know how > >> >> much the memory is demoted to the CXL memory device in a specific > period. > >> > > >> > I'll mention about it more. > >> > > >> > I had a conversation with one of our customers. He expressed a > >> > desire for more detailed profile information to analyze the > >> > behavior of demotion (and promotion) when his workloads are executed. > >> > If the results are not satisfactory for his workloads, he wants to > >> > tune his servers for his workloads with these profiles. > >> > Additionally, depending on the results, he may want to change his > >> > server > >> configuration. > >> > For example, he may want to buy more expensive DDR memories rather > >> > than > >> cheaper CXL memory. > >> > > >> > In my impression, our customers seems to think that CXL memory is > >> > NOT as > >> reliable as DDR memory yet. > >> > Therefore, they want to prepare for the new world that CXL will > >> > bring, and want to have a method for the preparation by profiling > >> > information as > >> much as possible. > >> > > >> > it this enough for your question? > >> > >> I want some more detailed information about how these stats are used? > >> Why isn't per-node pgdemote_xxx counter enough? > > > > I rechecked the customer's original request. > > > > - If a memory area is demoted to a CXL memory node, he wanted to > > analyze how it affects performance of their workload, such as > > latency. He wanted to use CXL Node memory usage as basic information for > the analysis. > > > > - If he notices that demotion occurs well on a server and CXL memories are > used 85% constantly, he > > may want to add DDR DRAM or select some other ways to avoid demotion. > > (His image is likely Swap free/used.) > > IIRC, demotion target is not spread to all of the CXL memory node, right? > > Then, he needs to know how CXL memory is occupied by demoted > memory. > > > > If I misunderstand something, or you have any better idea, please let > > us know. I'll talk with him again. (It will be next week...) > > > To check CXL memory usage, /proc/PID/numa_maps, > /sys/fs/cgroup/CGROUP/memory.numa_stat, and > /sys/devices/system/node/nodeN/meminfo can be used for process, cgroup, > and NUMA node respectively. Is this enough? Thank you for your idea We will investigate your idea and talk with our customer. Please wait. Thanks, --- Yasunori Goto > > -- > Best Regards, > Huang, Ying > > >> > > >> >> > >> >> > >> >> >>> mod_node_page_state(NODE_DATA(target_nid), > >> >> >>> - PGDEMOTE_KSWAPD + reclaimer_offset(), > >> >> nr_succeeded); > >> >> >>> + PGDEMOTE_DST_KSWAPD + reclaimer_offset(), > >> >> nr_succeeded); > >> >> > >> >> But if the *target_nid* is only indicate the preferred node, this > >> >> accounting maybe not accurate. > >> >> > > [snip]