Re: user space unresponsive, followup: lsf/mm congestion

Michal Hocko <mhocko@xxxxxxxxxx> · Wed, 8 Jan 2020 10:25:01 +0100

On Tue 07-01-20 14:25:46, Chris Murphy wrote:
> On Tue, Jan 7, 2020 at 1:58 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
[...]
> > Btw. from a quick look at the sysrq output there seems to be quite a lot
> > of tasks (more than 1k) running on the system. Only handful of them
> > belong to the compilation. kswapd is busy and 13 processes in direct
> > reclaim all swapping out to the disk.
> 
> There might be many dozens of tabs in Firefox with nothing loaded in
> them, trying to keep the testing more real world (a compile while
> browsing) rather than being too deferential to the compile. That does
> clutter the sysrq+t but it doesn't change the outcome of the central
> culprit which is the ninja compile, which by default does n+2 jobs
> where n is the number of virtual CPUs.

How much memory does the compile process eat?

> > From the above, my first guess would be that you are over subscribing
> > memory you have available. I would focus on who is consuming all that
> > memory.
> 
> ninja - I have made the argument that it is in some sense sabotaging
> the system, and I think they're trying to do something a little
> smarter with their defaults; however, it's an unprivileged task acting
> as a kind of fork bomb that takes down the system.

Well, I am not sure the fork bomb analogy is appropriate. There is only
a dozen compile processes captured so unless there are way much more
in other phases then this is really negligibe comparing to the rest of
the workloads running on the system.

> It's a really
> eyebrow raising and remarkable experience. And it's common within the
> somewhat vertical use case of developers compiling things on their own
> systems. Many IDE's use a ton of resources, as much as they can get.
> It's not clear to me by what mechanism either the user or these
> processes are supposed to effectively negotiate for limited resources,
> other than resource restriction. But anyway, they aren't contrived or
> malicious examples.

If you know that the compilation process is too disruptive wrt.
memory/cpu consumption then you can use cgroups (memory and cpu
controllers) to throttle that consumption and protect the rest of the
system. The compilation process will take much more time of course and
the explicit configuration is obviously less comfortable than out of the
box auto configuration but the kernel simply doesn't have information to
prioritize resources.

I do agree that the oom detection could be improved to detect a heavy
threshing - be it on page cache or swapin/out - and kill something
rather than leave the system struggling in a highly unproductive state.
This is far from trivial because what is productive is not something
kernel can tell easily as it depends on the workload. As mentioned
elsewhere userspace is likely much better suited to define that policy
and PSI seems to be a good indicator.

> A much more synthetic example is 'tail /dev/zero'
> which is much more quickly arrrested by the kernel oom-killer, at
> least on recent kernels.

Yeah, same like any other memory leak because the memory will simply run
out at some point and the OOM killer can detect that with a good
confidence. It is the threshing (working set not fitting into memory and
refaulting like crazy) that the kernel struggles (and loses) to handle.
-- 
Michal Hocko
SUSE Labs