While not directly answering your question, we (facebook) use oomd[0] widely across our fleet to solve the exact problem you have. I'd be happy to answer any questions about it. It should (if configured correctly) be much more reliable than a global memory.max and less heavy handed. In theory, cgooms are subject to the same "livelocks" as with the kernel oom killer. Daniel [0]: https://github.com/facebookincubator/oomd On Sun, Mar 17, 2019, at 9:13 AM, Tomasz Chmielewski wrote: > I think most of us saw the situation when the system becomes > unresponsive - to a point when SSH in doesn't work - because it's out of > memory and kernel's OOM-killer doesn't kick in as fast as it should. > > > I have a server which from time to time - let's say once a week - is > using too much memory. High memory usage can be caused by several > unrelated worker processes. Some of these workers have memory leaks > which are hard to diagnose. > > What happens next - the system becomes very slow for 1-30 minutes, until > kernel's OOM-killer kicks in. Offending process is killed, memory is > released - everything works smooth again. I'm not so worried about the > killed process; I'm more worried that the server is unresponsive for so > long. > > Ideal situation would be - the offending process is killed before the > system becomes very slow. However, OOM in the Linux kernel doesn't seem > to work this way (at least not always). > > > So I thought about "tricking it": > > - move the server to a container (LXD in this case) > - assign the container slightly less RAM than total system RAM (i.e. > 15.5 GB for a container, where the system has 16 GB RAM) > > The result was great - the system is responsive at all times, even if > some processes misbehave and try to use all RAM (OOM-killer kicks in in > container's cgroup, but the system as a whole is never out of memory > from kernel's point of view)! > > > How about achieving a similar result with just systemd? Is there some > system-wide MemoryMax which we could easily set in one place? > > I.e. a desktop system where user opens several browsers, with too many > tabs with too many memory-intensive pages - becomes unresponsive for > long minutes, before OOM-killer finally kills the offender. > > > Tomasz Chmielewski > _______________________________________________ > systemd-devel mailing list > systemd-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/systemd-devel _______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel