Re: user space unresponsive, followup: lsf/mm congestion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 7, 2020 at 1:58 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Tue 07-01-20 13:29:20, Chris Murphy wrote:
> > Hi,
> >
> > This is in response to:
> > https://lore.kernel.org/linux-fsdevel/20200104090955.GF23195@xxxxxxxxxxxxxxxxxxx/T/#m8b25fd42501d780d8053fc7aa9f4e3a28a19c49f
> >
> > I decided to open a bug report for tracking and attachments but I'm
> > also subscribed now to this list so - either here or there.
> >
> > "loss of responsiveness during heavy swap"
> > https://bugzilla.kernel.org/show_bug.cgi?id=206117
>
> Please collect more snapshots of /proc/vmstat (e.g. in 1s intervals)

OK.


> Btw. from a quick look at the sysrq output there seems to be quite a lot
> of tasks (more than 1k) running on the system. Only handful of them
> belong to the compilation. kswapd is busy and 13 processes in direct
> reclaim all swapping out to the disk.

There might be many dozens of tabs in Firefox with nothing loaded in
them, trying to keep the testing more real world (a compile while
browsing) rather than being too deferential to the compile. That does
clutter the sysrq+t but it doesn't change the outcome of the central
culprit which is the ninja compile, which by default does n+2 jobs
where n is the number of virtual CPUs.

> From the above, my first guess would be that you are over subscribing
> memory you have available. I would focus on who is consuming all that
> memory.

ninja - I have made the argument that it is in some sense sabotaging
the system, and I think they're trying to do something a little
smarter with their defaults; however, it's an unprivileged task acting
as a kind of fork bomb that takes down the system. It's a really
eyebrow raising and remarkable experience. And it's common within the
somewhat vertical use case of developers compiling things on their own
systems. Many IDE's use a ton of resources, as much as they can get.
It's not clear to me by what mechanism either the user or these
processes are supposed to effectively negotiate for limited resources,
other than resource restriction. But anyway, they aren't contrived or
malicious examples. A much more synthetic example is 'tail /dev/zero'
which is much more quickly arrrested by the kernel oom-killer, at
least on recent kernels.



-- 
Chris Murphy




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux