Thank you for your suggestion. I understand and am ready to make some changes > > Have verified the issue with some test? If not, I suggest you to do > that. > I have conducted tests: Applying this patch or not does not have a significant impact on the test results. perhaps my testing was not thorough enough. #^_^ But, the logic of the code is like following: CPU0 CPUx ---- ----- T0: vmstat_work is pending T1: vmstat_shepherd check vmstat_work and do nothing T2: vmstat_work is in unpending state. T3: alloc many pages T4: free all the pages allocated at T3 T5: entry NOHZ, flushing all zonestats and nodestats T6: next vmstat_shepherd fired In my opinion, there are indeed some issues. I'm not sure if there's something I haven't understood? By the way, There are two other questions for me: Q1: Vmstat_work is a **deferreable work** So, It may be delayed for a long time by NOHZ. As a result, "vmstat_update() may not be executed once every second in the above scenario. Therefore, I'm not sure if using a deferrable work to reduce pcp->high is appropriate. In my tests, if I don't use deferrable work, it takes about a minute to reduce high to high_min, but using deferrable work may take several minutes to reduce high to high_min. Q2: On a big machine, for example, with 1TB of memory, the default maximum memory on PCP can be 1TB * 0.125. This portion of memory is not accounted for in MemFree in /proc/meminfo. Users can see this portion of memory from /proc/zoneinfo, but the memory reported by the `free` command is reduced. can we include the PCP memory in the MemFree statistic in /proc/meminfo? > > While, This seems to be fine: > > - if freeing and allocating memory occur later, it may the > > high_max may be adjust automatically > > - If memory is tight, the memory reclamation process will > > release the pcp > > This could be a real issue for me. Thanks, I will test more carefully for those issue > > > Whatever, we make vmstat_shepherd to checking whether we need > > decay pcp high_max, and fire pcp_decay_high early if we need. > > > > Fixes: 51a755c56dc0 ("mm: tune PCP high automatically") > > Reviewed-by: Jinliang Zheng <alexjlzheng@xxxxxxxxxxx> > > Signed-off-by: MengEn Sun <mengensun@xxxxxxxxxxx> > > --- > > changelog: > > v1: https://lore.kernel.org/lkml/20241012154328.015f57635566485ad60712f3@xxxxxxxxxxxxxxxxxxxx/T/#t > > v2: Make the commit message clearer by adding some comments. > > --- > > mm/vmstat.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 1917c034c045..07b494b06872 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -2024,8 +2024,17 @@ static bool need_update(int cpu) > > > > for_each_populated_zone(zone) { > > struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); > > + struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); > > struct per_cpu_nodestat *n; > > > > + /* per_cpu_nodestats and per_cpu_zonestats maybe flush when cpu > > + * entering NOHZ full, see quiet_vmstat. so, we check pcp > > + * high_{min,max} to determine whether it is necessary to run > > + * decay_pcp_high on the corresponding CPU > > + */ > > Please follow the comments coding style. > > /* > * comments line 1 > * comments line 2 > */ > Thank you for your suggestion. I understand and am ready to make some changes > > + if (pcp->high_max > pcp->high_min) > > + return true; > > + > > We don't tune pcp->high_max/min in fact. Instead, we tune pcp->high. > Your code may make need_update() return true in most cases. You are right, using high_max is incorrect. May i use pcp->high > pcp->high_min? > > > /* > > * The fast way of checking if there are any vmstat diffs. > > */ > > -- > Best Regards, > Huang, Ying Best Regards, MengEn, Sun