On Mon, Jul 20, 2020 at 10:03:49AM +0200, Michal Hocko wrote: > On Tue 14-07-20 10:39:20, Roman Gushchin wrote: > > I've noticed a number of warnings like "vmstat_refresh: nr_free_cma > > -5" or "vmstat_refresh: nr_zone_write_pending -11" on our production > > hosts. The numbers of these warnings were relatively low and stable, > > so it didn't look like we are systematically leaking the counters. > > The corresponding vmstat counters also looked sane. > > > > These warnings are generated by the vmstat_refresh() function, which > > assumes that atomic zone and numa counters can't go below zero. > > However, on a SMP machine it's not quite right: due to per-cpu > > caching it can in theory be as low as -(zone threshold) * NR_CPUs. > > > > For instance, let's say all cma pages are in use and NR_FREE_CMA_PAGES > > reached 0. Then we've reclaimed a small number of cma pages on each > > CPU except CPU0, so that most percpu NR_FREE_CMA_PAGES counters are > > slightly positive (the atomic counter is still 0). Then somebody on > > CPU0 consumes all these pages. The number of pages can easily exceed > > the threshold and a negative value will be committed to the atomic > > counter. > > > > To fix the problem and avoid generating false warnings, let's just > > relax the condition and warn only if the value is less than minus > > the maximum theoretically possible drift value, which is 125 * > > number of online CPUs. It will still allow to catch systematic leaks, > > but will not generate bogus warnings. > > > > Signed-off-by: Roman Gushchin <guro@xxxxxx> > > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > > Acked-by: Michal Hocko <mhocko@xxxxxxxx> > > One minor nit which can be handled by a separate patch but now that you > are touching this code... Thank you! > > > --- > > Documentation/admin-guide/sysctl/vm.rst | 4 ++-- > > mm/vmstat.c | 30 ++++++++++++++++--------- > > 2 files changed, 21 insertions(+), 13 deletions(-) > > > > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst > > index 4b9d2e8e9142..95fb80d0c606 100644 > > --- a/Documentation/admin-guide/sysctl/vm.rst > > +++ b/Documentation/admin-guide/sysctl/vm.rst > > @@ -822,8 +822,8 @@ e.g. cat /proc/sys/vm/stat_refresh /proc/meminfo > > > > As a side-effect, it also checks for negative totals (elsewhere reported > > as 0) and "fails" with EINVAL if any are found, with a warning in dmesg. > > -(At time of writing, a few stats are known sometimes to be found negative, > > -with no ill effects: errors and warnings on these stats are suppressed.) > > +(On a SMP machine some stats can temporarily become negative, with no ill > > +effects: errors and warnings on these stats are suppressed.) > > > > > > numa_stat > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index a21140373edb..8f0ef8aaf8ee 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -169,6 +169,8 @@ EXPORT_SYMBOL(vm_node_stat); > > > > #ifdef CONFIG_SMP > > > > +#define MAX_THRESHOLD 125 > > This would deserve a comment. 88f5acf88ae6a didn't really explain why > this specific value has been selected but the specific value shouldn't > really matter much. I would go with the following at least. > " > Maximum sync threshold for per-cpu vmstat counters. > " Agree. Below is the diff to be squashed into the original patch. Thanks! -- diff --git a/mm/vmstat.c b/mm/vmstat.c index 08e415e0a15d..ddc59b533599 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -167,6 +167,9 @@ EXPORT_SYMBOL(vm_zone_stat); EXPORT_SYMBOL(vm_numa_stat); EXPORT_SYMBOL(vm_node_stat); +/* + * Maximum sync threshold for per-cpu vmstat counters. + */ #ifdef CONFIG_SMP #define MAX_THRESHOLD 125 #else