MengEn Sun <mengensun88@xxxxxxxxx> writes: > Thank you for your suggestion. I understand and am ready to make > some changes > >> >> Have verified the issue with some test? If not, I suggest you to do >> that. >> > > I have conducted tests: > Applying this patch or not does not have a significant impact on the > test results. I don't expect some measurable performance difference with the patch. If we can observe that the PCP size isn't tuned down to high_min before and is after, that should be a valid test result to show the value of the patch. Can you try that? The PCP size can be observed via /proc/zoneinfo. > perhaps my testing was not thorough enough. #^_^ > > But, the logic of the code is like following: > CPU0 CPUx > ---- ----- > T0: vmstat_work is pending > T1: vmstat_shepherd > check vmstat_work > and do nothing > T2: vmstat_work is in unpending state. > > T3: alloc many pages > T4: free all the pages allocated at T3 > T5: entry NOHZ, flushing all zonestats > and nodestats > T6: next vmstat_shepherd fired > > In my opinion, there are indeed some issues. I'm not sure if there's > something I haven't understood? > > > By the way, > There are two other questions for me: > Q1: > Vmstat_work is a **deferreable work** So, It may be delayed for a long time > by NOHZ. As a result, "vmstat_update() may not be executed once every > second in the above scenario. Therefore, I'm not sure if using a deferrable > work to reduce pcp->high is appropriate. In my tests, if I don't use > deferrable work, it takes about a minute to reduce high to high_min, but > using deferrable work may take several minutes to reduce high to high_min. It's not a big issue to take longer time to decay pcp->high. > Q2: > On a big machine, for example, with 1TB of memory, the default maximum > memory on PCP can be 1TB * 0.125. > This portion of memory is not accounted for in MemFree in /proc/meminfo. > Users can see this portion of memory from /proc/zoneinfo, but the memory > reported by the `free` command is reduced. > can we include the PCP memory in the MemFree statistic in /proc/meminfo? This has been discussed before. https://lore.kernel.org/linux-mm/20220816084426.135528-1-wangkefeng.wang@xxxxxxxxxx/ https://lore.kernel.org/linux-mm/20240830014453.3070909-1-mawupeng1@xxxxxxxxxx/ >> > While, This seems to be fine: >> > - if freeing and allocating memory occur later, it may the >> > high_max may be adjust automatically >> > - If memory is tight, the memory reclamation process will >> > release the pcp >> >> This could be a real issue for me. > > Thanks, I will test more carefully for those issue > >> >> > Whatever, we make vmstat_shepherd to checking whether we need >> > decay pcp high_max, and fire pcp_decay_high early if we need. >> > >> > Fixes: 51a755c56dc0 ("mm: tune PCP high automatically") >> > Reviewed-by: Jinliang Zheng <alexjlzheng@xxxxxxxxxxx> >> > Signed-off-by: MengEn Sun <mengensun@xxxxxxxxxxx> >> > --- >> > changelog: >> > v1: https://lore.kernel.org/lkml/20241012154328.015f57635566485ad60712f3@xxxxxxxxxxxxxxxxxxxx/T/#t >> > v2: Make the commit message clearer by adding some comments. >> > --- >> > mm/vmstat.c | 9 +++++++++ >> > 1 file changed, 9 insertions(+) >> > >> > diff --git a/mm/vmstat.c b/mm/vmstat.c >> > index 1917c034c045..07b494b06872 100644 >> > --- a/mm/vmstat.c >> > +++ b/mm/vmstat.c >> > @@ -2024,8 +2024,17 @@ static bool need_update(int cpu) >> > >> > for_each_populated_zone(zone) { >> > struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); >> > + struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); >> > struct per_cpu_nodestat *n; >> > >> > + /* per_cpu_nodestats and per_cpu_zonestats maybe flush when cpu >> > + * entering NOHZ full, see quiet_vmstat. so, we check pcp >> > + * high_{min,max} to determine whether it is necessary to run >> > + * decay_pcp_high on the corresponding CPU >> > + */ >> >> Please follow the comments coding style. >> >> /* >> * comments line 1 >> * comments line 2 >> */ >> > > Thank you for your suggestion. I understand and am ready to make > some changes > >> > + if (pcp->high_max > pcp->high_min) >> > + return true; >> > + >> >> We don't tune pcp->high_max/min in fact. Instead, we tune pcp->high. >> Your code may make need_update() return true in most cases. > > You are right, using high_max is incorrect. May i use pcp->high > pcp->high_min? > >> >> > /* >> > * The fast way of checking if there are any vmstat diffs. >> > */ -- Best Regards, Huang, Ying