On Thu, 6 Aug 2020, Andrew Morton wrote: > On Thu, 6 Aug 2020 16:38:04 -0700 Roman Gushchin <guro@xxxxxx> wrote: August, yikes, I thought it was much more recent. > > > it seems that Hugh and me haven't reached a consensus here. > > Can, you, please, not merge this patch into 5.9, so we would have > > more time to find a solution, acceptable for all? > > No probs. I already had a big red asterisk on it ;) I've a suspicion that Andrew might be tiring of his big red asterisk, and wanting to unload mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings.patch mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix.patch mm-vmstat-fix-proc-sys-vm-stat_refresh-generating-false-warnings-fix-2.patch into 5.12. I would prefer not, and reiterate my Nack: but no great harm will befall the cosmos if he overrules that, and it does go through to 5.12 - I'll just want to revert it again later. And I do think a more straightforward way of suppressing those warnings would be just to delete the code that issues them, rather than brushing them under a carpet of overtuning. I've been running mmotm with the patch below (shown as sign of good faith, and for you to try, but not ready to go yet) for a few months now - overriding your max_drift, restoring nr_writeback and friends to the same checking, fixing the obvious reason why nr_zone_write_pending and nr_writeback are seen negative occasionally (interrupt interrupting to decrement those stats before they have even been incremented). Two big BUTs (if not asterisks): since adding that patch, I have usually forgotten all about it, so forgotten to run the script that echoes /proc/sys/vm/stat_refresh at odd intervals while under load: so have less data than I'd intended by now. And secondly (and I've just checked again this evening) I do still see nr_zone_write_pending and nr_writeback occasionally caught negative while under load. So, there's something more at play, perhaps the predicted Gushchin Effect (but wouldn't they go together if so? I've only seen them separately), or maybe something else, I don't know. Those are the only stats I've seen caught negative, but I don't have CMA configured at all. You mention nr_free_cma as the only(?) other stat you've seen negative, that of course I won't see, but looking at the source I now notice that NR_FREE_CMA_PAGES is incremented and decremented according to page migratetype... ... internally we have another stat that's incremented and decremented according to page migratetype, and that one has been seen negative too: isn't page migratetype something that usually stays the same, but sometimes the migratetype of the page's block can change, even while some pages of it are allocated? Not a stable basis for maintaining stats, though won't matter much if they are only for display. vmstat_refresh could just exempt nr_zone_write_pending, nr_writeback and nr_free_cma from warnings, if we cannot find a fix to them: but I see no reason to suppress warnings on all the other vmstats. The patch I've been testing with: --- mmotm/mm/page-writeback.c 2021-02-14 14:32:24.000000000 -0800 +++ hughd/mm/page-writeback.c 2021-02-20 18:01:11.264162616 -0800 @@ -2769,6 +2769,13 @@ int __test_set_page_writeback(struct pag int ret, access_ret; lock_page_memcg(page); + /* + * Increment counts in advance, so that they will not go negative + * if test_clear_page_writeback() comes in to decrement them. + */ + inc_lruvec_page_state(page, NR_WRITEBACK); + inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); + if (mapping && mapping_use_writeback_tags(mapping)) { XA_STATE(xas, &mapping->i_pages, page_index(page)); struct inode *inode = mapping->host; @@ -2804,9 +2811,14 @@ int __test_set_page_writeback(struct pag } else { ret = TestSetPageWriteback(page); } - if (!ret) { - inc_lruvec_page_state(page, NR_WRITEBACK); - inc_zone_page_state(page, NR_ZONE_WRITE_PENDING); + + if (WARN_ON_ONCE(ret)) { + /* + * Correct counts in retrospect, if PageWriteback was already + * set; but does any filesystem ever allow this to happen? + */ + dec_lruvec_page_state(page, NR_WRITEBACK); + dec_zone_page_state(page, NR_ZONE_WRITE_PENDING); } unlock_page_memcg(page); access_ret = arch_make_page_accessible(page); --- mmotm/mm/vmstat.c 2021-02-20 17:59:44.838171232 -0800 +++ hughd/mm/vmstat.c 2021-02-20 18:01:11.272162661 -0800 @@ -1865,7 +1865,7 @@ int vmstat_refresh(struct ctl_table *tab for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { val = atomic_long_read(&vm_zone_stat[i]); - if (val < -max_drift) { + if (val < 0) { pr_warn("%s: %s %ld\n", __func__, zone_stat_name(i), val); err = -EINVAL; @@ -1874,13 +1874,21 @@ int vmstat_refresh(struct ctl_table *tab #ifdef CONFIG_NUMA for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) { val = atomic_long_read(&vm_numa_stat[i]); - if (val < -max_drift) { + if (val < 0) { pr_warn("%s: %s %ld\n", __func__, numa_stat_name(i), val); err = -EINVAL; } } #endif + for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) { + val = atomic_long_read(&vm_node_stat[i]); + if (val < 0) { + pr_warn("%s: %s %ld\n", + __func__, node_stat_name(i), val); + err = -EINVAL; + } + } if (err) return err; if (write)