Re: [patch 00/11] userspace out of memory handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 4 Mar 2014 19:58:38 -0800 (PST) David Rientjes <rientjes@xxxxxxxxxx> wrote:

> This patchset implements userspace out of memory handling.
> 
> It is based on v3.14-rc5.  Individual patches will apply cleanly or you
> may pull the entire series from
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/rientjes/linux.git mm/oom
> 
> When the system or a memcg is oom, processes running on that system or
> attached to that memcg cannot allocate memory.  It is impossible for a
> process to reliably handle the oom condition from userspace.
> 
> First, consider only system oom conditions.  When memory is completely
> depleted and nothing may be reclaimed, the kernel is forced to free some
> memory; the only way it can do so is to kill a userspace process.  This
> will happen instantaneously and userspace can enforce neither its own
> policy nor collect information.
> 
> On system oom, there may be a hierarchy of memcgs that represent user
> jobs, for example.  Each job may have a priority independent of their
> current memory usage.  There is no existing kernel interface to kill the
> lowest priority job; userspace can now kill the lowest priority job or
> allow priorities to change based on whether the job is using more memory
> than its pre-defined reservation.
> 
> Additionally, users may want to log the condition or debug applications
> that are using too much memory.  They may wish to collect heap profiles
> or are able to do memory freeing without killing a process by throttling
> or ratelimiting.
> 
> Interactive users using X window environments may wish to have a dialogue
> box appear to determine how to proceed -- it may even allow them shell
> access to examine the state of the system while oom.
> 
> It's not sufficient to simply restrict all user processes to a subset of
> memory and oom handling processes to the remainder via a memcg hierarchy:
> kernel memory and other page allocations can easily deplete all memory
> that is not charged to a user hierarchy of memory.
> 
> This patchset allows userspace to do all of these things by defining a
> small memory reserve that is accessible only by processes that are
> handling the notification.
> 
> Second, consider memcg oom conditions.  Processes need no special
> knowledge of whether they are attached to the root memcg, where memcg
> charging will always succeed, or a child memcg where charging will fail
> when the limit has been reached.  This allows those processes handling
> memcg oom conditions to overcharge the memcg by the amount of reserved
> memory.  They need not create child memcgs with smaller limits and
> attach the userspace oom handler only to the parent; such support would
> not allow userspace to handle system oom conditions anyway.
> 
> This patchset introduces a standard interface through memcg that allows
> both of these conditions to be handled in the same clean way: users
> define memory.oom_reserve_in_bytes to define the reserve and this
> amount is allowed to be overcharged to the process handling the oom
> condition's memcg.  If used with the root memcg, this amount is allowed
> to be allocated below the per-zone watermarks for root processes that
> are handling such conditions (only root may write to
> cgroup.event_control for the root memcg).

If process A is trying to allocate memory, cannot do so and the
userspace oom-killer is invoked, there must be means via which process
A waits for the userspace oom-killer's action.  And there must be
fallbacks which occur if the userspace oom killer fails to clear the
oom condition, or times out.

Would be interested to see a description of how all this works.


It is unfortunate that this feature is memcg-only.  Surely it could
also be used by non-memcg setups.  Would like to see at least a
detailed description of how this will all be presented and implemented.
We should aim to make the memcg and non-memcg userspace interfaces and
user-visible behaviour as similar as possible.

Patches 1, 2, 3 and 5 appear to be independent and useful so I think
I'll cherrypick those, OK?
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux