On Mon, Mar 11, 2019 at 01:10:36PM -0700, Suren Baghdasaryan wrote: > The idea seems interesting although I need to think about this a bit > more. Killing processes based on failed page allocation might backfire > during transient spikes in memory usage. This issue could be alleviated if tasks could be killed and have their pages reaped faster. Currently, Linux takes a _very_ long time to free a task's memory after an initial privileged SIGKILL is sent to a task, even with the task's priority being set to the highest possible (so unwanted scheduler preemption starving dying tasks of CPU time is not the issue at play here). I've frequently measured the difference in time between when a SIGKILL is sent for a task and when free_task() is called for that task to be hundreds of milliseconds, which is incredibly long. AFAIK, this is a problem that LMKD suffers from as well, and perhaps any OOM killer implementation in Linux, since you cannot evaluate effect you've had on memory pressure by killing a process for at least several tens of milliseconds. > AFAIKT the biggest issue with using this approach in userspace is that > it's not practically implementable without heavy in-kernel support. > How to implement such interaction between kernel and userspace would > be an interesting discussion which I would be happy to participate in. You could signal a lightweight userspace process that has maximum scheduler priority and have it kill the tasks it'd like. Thanks, Sultan