Re: [patch -mm 08/18] oom: badness heuristic rewrite

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> This a complete rewrite of the oom killer's badness() heuristic which is
> used to determine which task to kill in oom conditions.  The goal is to
> make it as simple and predictable as possible so the results are better
> understood and we end up killing the task which will lead to the most
> memory freeing while still respecting the fine-tuning from userspace.
> 
> The baseline for the heuristic is a proportion of memory that each task is
> currently using in memory plus swap compared to the amount of "allowable"
> memory.  "Allowable," in this sense, means the system-wide resources for
> unconstrained oom conditions, the set of mempolicy nodes, the mems
> attached to current's cpuset, or a memory controller's limit.  The
> proportion is given on a scale of 0 (never kill) to 1000 (always kill),
> roughly meaning that if a task has a badness() score of 500 that the task
> consumes approximately 50% of allowable memory resident in RAM or in swap
> space.
> 
> The proportion is always relative to the amount of "allowable" memory and
> not the total amount of RAM systemwide so that mempolicies and cpusets may
> operate in isolation; they shall not need to know the true size of the
> machine on which they are running if they are bound to a specific set of
> nodes or mems, respectively.
> 
> Root tasks are given 3% extra memory just like __vm_enough_memory()
> provides in LSMs.  In the event of two tasks consuming similar amounts of
> memory, it is generally better to save root's task.
> 
> Because of the change in the badness() heuristic's baseline, it is also
> necessary to introduce a new user interface to tune it.  It's not possible
> to redefine the meaning of /proc/pid/oom_adj with a new scale since the
> ABI cannot be changed for backward compatability.  Instead, a new tunable,
> /proc/pid/oom_score_adj, is added that ranges from -1000 to +1000.  It may
> be used to polarize the heuristic such that certain tasks are never
> considered for oom kill while others may always be considered.  The value
> is added directly into the badness() score so a value of -500, for
> example, means to discount 50% of its memory consumption in comparison to
> other tasks either on the system, bound to the mempolicy, in the cpuset,
> or sharing the same memory controller.
> 
> /proc/pid/oom_adj is changed so that its meaning is rescaled into the
> units used by /proc/pid/oom_score_adj, and vice versa.  Changing one of
> these per-task tunables will rescale the value of the other to an
> equivalent meaning.  Although /proc/pid/oom_adj was originally defined as
> a bitshift on the badness score, it now shares the same linear growth as
> /proc/pid/oom_score_adj but with different granularity.  This is required
> so the ABI is not broken with userspace applications and allows oom_adj to
> be deprecated for future removal.
> 
> Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx>

nack


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]