On Thu, Feb 25, 2021 at 03:14:03PM -0800, Hugh Dickins wrote: > vmstat_refresh() can occasionally catch nr_zone_write_pending and > nr_writeback when they are transiently negative. The reason is partly > that the interrupt which decrements them in test_clear_page_writeback() > can come in before __test_set_page_writeback() got to increment them; > but transient negatives are still seen even when that is prevented, and > we have not yet resolved why (Roman believes that it is an unavoidable > consequence of the refresh scheduled on each cpu). But those stats are > not buggy, they have never been seen to drift away from 0 permanently: > so just avoid the annoyance of showing a warning on them. > > Similarly avoid showing a warning on nr_free_cma: CMA users have seen > that one reported negative from /proc/sys/vm/stat_refresh too, but it > does drift away permanently: I believe that's because its incrementation > and decrementation are decided by page migratetype, but the migratetype > of a pageblock is not guaranteed to be constant. > > Use switch statements so we can most easily add or remove cases later. I'm OK with the code, but I can't fully agree with the commit log. I don't think there is any mystery around negative values. Let me copy-paste the explanation from my original patch: These warnings* are generated by the vmstat_refresh() function, which assumes that atomic zone and numa counters can't go below zero. However, on a SMP machine it's not quite right: due to per-cpu caching it can in theory be as low as -(zone threshold) * NR_CPUs. For instance, let's say all cma pages are in use and NR_FREE_CMA_PAGES reached 0. Then we've reclaimed a small number of cma pages on each CPU except CPU0, so that most percpu NR_FREE_CMA_PAGES counters are slightly positive (the atomic counter is still 0). Then somebody on CPU0 consumes all these pages. The number of pages can easily exceed the threshold and a negative value will be committed to the atomic counter. * warnings about negative NR_FREE_CMA_PAGES Actually, the same is almost true for ANY other counter. What differs CMA, dirty and write pending counters is that they can reach 0 value under normal conditions. Other counters are usually not reaching values small enough to see negative values on a reasonable sized machine. Does it makes sense? > > Link: https://lore.kernel.org/linux-mm/20200714173747.3315771-1-guro@xxxxxx/ > Reported-by: Roman Gushchin <guro@xxxxxx> > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > --- > > mm/vmstat.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > --- vmstat2/mm/vmstat.c 2021-02-25 11:56:18.000000000 -0800 > +++ vmstat3/mm/vmstat.c 2021-02-25 12:42:15.000000000 -0800 > @@ -1840,6 +1840,14 @@ int vmstat_refresh(struct ctl_table *tab > if (err) > return err; > for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { > + /* > + * Skip checking stats known to go negative occasionally. > + */ > + switch (i) { > + case NR_ZONE_WRITE_PENDING: > + case NR_FREE_CMA_PAGES: > + continue; > + } > val = atomic_long_read(&vm_zone_stat[i]); > if (val < 0) { > pr_warn("%s: %s %ld\n", > @@ -1856,6 +1864,13 @@ int vmstat_refresh(struct ctl_table *tab > } > #endif > for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { > + /* > + * Skip checking stats known to go negative occasionally. > + */ > + switch (i) { > + case NR_WRITEBACK: > + continue; > + } > val = atomic_long_read(&vm_node_stat[i]); > if (val < 0) { > pr_warn("%s: %s %ld\n",