Re: [PATCH 0/10] psi: pressure stall information for CPU, memory, and IO v2

Johannes Weiner <hannes@xxxxxxxxxxx> · Wed, 18 Jul 2018 18:21:57 -0400

On Tue, Jul 17, 2018 at 01:25:15PM +0200, Michal Hocko wrote:
> On Mon 16-07-18 10:57:45, Daniel Drake wrote:
> > Hi Johannes,
> > 
> > Thanks for your work on psi! 
> > 
> > We have also been investigating the "thrashing problem" on our Endless
> > desktop OS. We have seen that systems can easily get into a state where the
> > UI becomes unresponsive to input, and the mouse cursor becomes extremely
> > slow or stuck when the system is running out of memory. We are working with
> > a full GNOME desktop environment on systems with only 2GB RAM, and
> > sometimes no real swap (although zram-swap helps mitigate the problem to
> > some extent).
> > 
> > My analysis so far indicates that when the system is low on memory and hits
> > this condition, the system is spending much of the time under
> > __alloc_pages_direct_reclaim. "perf trace -F" shows many many page faults
> > in executable code while this is going on. I believe the kernel is
> > swapping out executable code in order to satisfy memory allocation
> > requests, but then that swapped-out code is needed a moment later so it
> > gets swapped in again via the page fault handler, and all this activity
> > severely starves the system from being able to respond to user input.
> > 
> > I appreciate the kernel's attempt to keep processes alive, but in the
> > desktop case we see that the system rarely recovers from this situation,
> > so you have to hard shutdown. In this case we view it as desirable that
> > the OOM killer would step in (it is not doing so because direct reclaim
> > is not actually failing).

Yes, we currently use a userspace application that monitors pressure
and OOM kills (there is usually plenty of headroom left for a small
application to run by the time quality of service for most workloads
has already tanked to unacceptable levels). We want to eventually add
this back into the kernel with the appropriate configuration options
(pressure threshold value and sustained duration etc.)

> Yes this is really unfortunate. One thing that could help would be to
> consider a trashing level during the reclaim (get_scan_count) to simply
> forget about LRUs which are constantly refaulting pages back. We already
> have the infrastructure for that. We just need to plumb it in.

This doesn't work without quantifying the actual time you're spending
on thrashing IO. The cutoff for acceptable refaults is very different
between rotating disks, crappy SSDs, and high-end flash.

But in the future we might want the OOM killer to monitor psi memory
levels and dispatch tasks when we sustain X percent for Y seconds.