On Wed 19-02-20 12:42:20, Sultan Alsawaf wrote: > On Wed, Feb 19, 2020 at 09:05:27PM +0100, Michal Hocko wrote: [...] > > Again, do you have more details about the workload and what was the > > cause of responsiveness issues? Because I would expect that the > > situation would be quite opposite because it is usually the direct > > reclaim that is a source of stalls visible from userspace. Or is this > > about a single CPU situation where kswapd saturates the single CPU and > > all other tasks are just not getting enough CPU cycles? > > The workload was having lots of applications open at once. At a certain point > when memory ran low, my system became sluggish and kswapd CPU usage skyrocketed. Could you provide more details please? Is kswapd making a forward progress? Have you checked why other precesses are slugish? They do not get CPU time or they are blocked on something? > I added printks into kswapd with this patch, and my premature exit in kswapd > kicked in quite often. > > > > On systems with more memory I tested (>=4G), kswapd becomes more expensive to > > > run at its higher scan depths, so stopping kswapd prematurely when there aren't > > > any memory allocations waiting for it prevents it from reaching the *really* > > > expensive scan depths and burning through even more resources. > > > > > > Combine a large amount of memory with a slow CPU and the current problematic > > > behavior of kswapd at high memory pressure shows. My personal test scenario for > > > this was an arm64 CPU with a variable amount of memory (up to 4G RAM + 2G swap). > > > > But still, somebody has to put the system into balanced state so who is > > going to do all the work? > > All the work will be done by kswapd of course, but only if it's needed. > > The real problem is that a single memory allocation failure, and free memory > being some amount below the high watermark, are not good heuristics to predict > *future* memory allocation needs. They are good for determining how to steer > kswapd to help satisfy a failed allocation in the present, but anything more is > pure speculation (which turns out to be wrong speculation, since this behavior > causes problems). Well, you might be right that there might be better heuristics than the existing watermark based one. After all nobody can predict the future. The existing heuristic aims at providing min_free_kbytes of free memory as much as possible and that tends to work reasonably well for a large set of workloads. > If there are outstanding failed allocations that won't go away, then it's > perfectly reasonable to keep kswapd running until it frees pages up to the high > watermark. But beyond that is unnecessary, since there's no way to know if or > when kswapd will need to fire up again. This makes sense considering how kswapd > is currently invoked: it's fired up due to a failed allocation of some sort, not > because the amount of free memory dropped below the high watermark. Very broadly speaking (sorry if I am stating obvious here), the kswapd is woken up when the allocator hits low watermark or the reguested high order pages are depleted. Then allocator enters its slow path. That means that the background reclaim then aims at reclaiming the high-low watermark gap or invokes compaction to keep the balance. It takes to consume that gap to wake the kswapd again for order-0 (most common) requests. So this is usually not about a single allocation to trigger the background reclaim and counting failures on low watermark attempts is unlikely to work with the current code as you suggested. -- Michal Hocko SUSE Labs