On Wed, Jan 8, 2020 at 2:25 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Tue 07-01-20 14:25:46, Chris Murphy wrote: > > On Tue, Jan 7, 2020 at 1:58 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > [...] > > > Btw. from a quick look at the sysrq output there seems to be quite a lot > > > of tasks (more than 1k) running on the system. Only handful of them > > > belong to the compilation. kswapd is busy and 13 processes in direct > > > reclaim all swapping out to the disk. > > > > There might be many dozens of tabs in Firefox with nothing loaded in > > them, trying to keep the testing more real world (a compile while > > browsing) rather than being too deferential to the compile. That does > > clutter the sysrq+t but it doesn't change the outcome of the central > > culprit which is the ninja compile, which by default does n+2 jobs > > where n is the number of virtual CPUs. > > How much memory does the compile process eat? By default it sets jobs to numcpus+2, which is 10. But each job variably has two processes, and each process's memory requirement varies a ton, few M to over 1G. In the first 20 minutes, about 13000 processes have started and stopped. I've updated the bug, attaching kernel messages and /proc/vmstate in 1s increments, although quite often during the build multiple seconds of sampling were just skipped as the system was under too much pressure. > If you know that the compilation process is too disruptive wrt. > memory/cpu consumption then you can use cgroups (memory and cpu > controllers) to throttle that consumption and protect the rest of the > system. The compilation process will take much more time of course and > the explicit configuration is obviously less comfortable than out of the > box auto configuration but the kernel simply doesn't have information to > prioritize resources. Yes but this isn't scalable for regular users who just follow an upstream's build instructions. > I do agree that the oom detection could be improved to detect a heavy > threshing - be it on page cache or swapin/out - and kill something > rather than leave the system struggling in a highly unproductive state. > This is far from trivial because what is productive is not something > kernel can tell easily as it depends on the workload. As mentioned > elsewhere userspace is likely much better suited to define that policy > and PSI seems to be a good indicator. And even user space doesn't know what resources are required in advance. The user can guess this has been estimated incorrectly, force power off, start over by passing a lower number of jobs or whatever. As for PSI, from oomd folks it sounds like swap is a requirement. And yet, because of the poor performance of swapping, quite a lot of users don't have any swap. It's also mixed in server environments to have swap, and rare in cloud environments to have swap. So if there's a hard requirement on swap existing, PSI isn't a universal solution. Thanks, -- Chris Murphy