On Tue, Feb 25, 2020 at 06:51:30PM -0800, Andrew Morton wrote: > On Tue, 25 Feb 2020 14:15:31 +0000 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > > Ivan Babrou reported the following > > http://lkml.kernel.org/r/CABWYdi1eOUD1DHORJxTsWPMT3BcZhz++xP1pXhT=x4SgxtgQZA@xxxxxxxxxxxxxx > is helpful. > Noted for future reference. > > Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when > > an external fragmentation event occurs") introduced undesired > > effects in our environment. > > > > * NUMA with 2 x CPU > > * 128GB of RAM > > * THP disabled > > * Upgraded from 4.19 to 5.4 > > > > Before we saw free memory hover at around 1.4GB with no > > spikes. After the upgrade we saw some machines decide that they > > need a lot more than that, with frequent spikes above 10GB, > > often only on a single numa node. > > > > There have been a few reports recently that might be watermark boost > > related. Unfortunately, finding someone that can reproduce the problem > > and test a patch has been problematic. This series intends to limit > > potential damage only. > > It's problematic that we don't understand what's happening. And these > palliatives can only reduce our ability to do that. > Not for certain no, but we do know that there are conditions whereby node 0 can end up reclaiming excessively for extended periods of time. The available evidence does match a pattern whereby a lower zone on node 0 is getting stuck in a boosted state. > Rik seems to have the means to reproduce this (or something similar) > and it seems Ivan can test patches three weeks hence. If Rik can reproduce it great but I have a strong feeling that Ivan may never be able to test this if it requires a production machine which is why I did not wait the three weeks. > So how about a > debug patch which will help figure out what's going on in there? A debug patch would not help much in this case given that we have tracepoints. An ftrace containing mm_page_alloc_extfrag, mm_vmscan_kswapd_wake, mm_vmscan_wakeup_kswapd and mm_vmscan_node_reclaim_begin would be a big help for 30 seconds while the problem is occurring would work. Ideally mm_vmscan_lru_shrink_inactive would also be included to capture the priority but the size of the trace is what's going to be problematic. mm_page_alloc_extfrag would be correlated with the conditions that boost the watermarks and the others would track what kswapd is doing to see if it's persistently reclaiming. If they are, mm_vmscan_lru_shrink_inactive would tell if it's persistently reclaiming at priority DEF_PRIORITY - 2 which would prove the patch would at least mitigate the problem. It would be more preferable to have a description of a testcase that reproduces the problem and I'll capture/analyse the trace myself. It would also be something I could slot into a test grid to catch the problem happening again in the future. -- Mel Gorman SUSE Labs