On Wed, Feb 19, 2020 at 09:05:27PM +0100, Michal Hocko wrote: > Could you be more specific please? kspwad should stop as soon as the > high watermark is reached. If that is not the case then there is a bug > which should be fixed. No, there is no bug causing kswapd to continue beyond the high watermark. > Sure it is quite possible that kswapd is busy for extended amount of > time if the memory pressure is continuous. > > > On a constrained system I tested (mem=2G), this patch had the positive effect of > > improving overall responsiveness at high memory pressure. > > Again, do you have more details about the workload and what was the > cause of responsiveness issues? Because I would expect that the > situation would be quite opposite because it is usually the direct > reclaim that is a source of stalls visible from userspace. Or is this > about a single CPU situation where kswapd saturates the single CPU and > all other tasks are just not getting enough CPU cycles? The workload was having lots of applications open at once. At a certain point when memory ran low, my system became sluggish and kswapd CPU usage skyrocketed. I added printks into kswapd with this patch, and my premature exit in kswapd kicked in quite often. > > On systems with more memory I tested (>=4G), kswapd becomes more expensive to > > run at its higher scan depths, so stopping kswapd prematurely when there aren't > > any memory allocations waiting for it prevents it from reaching the *really* > > expensive scan depths and burning through even more resources. > > > > Combine a large amount of memory with a slow CPU and the current problematic > > behavior of kswapd at high memory pressure shows. My personal test scenario for > > this was an arm64 CPU with a variable amount of memory (up to 4G RAM + 2G swap). > > But still, somebody has to put the system into balanced state so who is > going to do all the work? All the work will be done by kswapd of course, but only if it's needed. The real problem is that a single memory allocation failure, and free memory being some amount below the high watermark, are not good heuristics to predict *future* memory allocation needs. They are good for determining how to steer kswapd to help satisfy a failed allocation in the present, but anything more is pure speculation (which turns out to be wrong speculation, since this behavior causes problems). If there are outstanding failed allocations that won't go away, then it's perfectly reasonable to keep kswapd running until it frees pages up to the high watermark. But beyond that is unnecessary, since there's no way to know if or when kswapd will need to fire up again. This makes sense considering how kswapd is currently invoked: it's fired up due to a failed allocation of some sort, not because the amount of free memory dropped below the high watermark. Sultan