Re: [RFC] memory reserve for userspace oom-killer

Shakeel Butt <shakeelb@xxxxxxxxxx> · Tue, 4 May 2021 17:37:32 -0700

On Wed, Apr 21, 2021 at 7:29 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
[...]
> > > What if the pool is depleted?
> >
> > This would mean that either the estimate of mempool size is bad or
> > oom-killer is buggy and leaking memory.
> >
> > I am open to any design directions for mempool or some other way where
> > we can provide a notion of memory guarantee to oom-killer.
>
> OK, thanks for clarification. There will certainly be hard problems to
> sort out[1] but the overall idea makes sense to me and it sounds like a
> much better approach than a OOM specific solution.
>
>
> [1] - how the pool is going to be replenished without hitting all
> potential reclaim problems (thus dependencies on other all tasks
> directly/indirectly) yet to not rely on any background workers to do
> that on the task behalf without a proper accounting etc...
> --

I am currently contemplating between two paths here:

First, the mempool, exposed through either prctl or a new syscall.
Users would need to trace their userspace oom-killer (or whatever
their use case is) to find an appropriate mempool size they would need
and periodically refill the mempools if allowed by the state of the
machine. The challenge here is to find a good value for the mempool
size and coordinating the refilling of mempools.

Second is a mix of Roman and Peter's suggestions but much more
simplified. A very simple watchdog with a kill-list of processes and
if userspace didn't pet the watchdog within a specified time, it will
kill all the processes in the kill-list. The challenge here is to
maintain/update the kill-list.

I would prefer the direction which oomd and lmkd are open to adopt.

Any suggestions?