Hi Michal,
On 7/8/22 4:54 PM, Michal Hocko Wrote:
On Fri 08-07-22 16:21:24, Gang Li wrote:
TLDR
----
If a mempolicy or cpuset is in effect, out_of_memory() will select victim
on specific node to kill. So that kernel can avoid accidental killing on
NUMA system.
We have discussed this in your previous posting and an alternative
proposal was to use cpusets to partition NUMA aware workloads and
enhance the oom killer to be cpuset aware instead which should be a much
easier solution.
Problem
-------
Before this patch series, oom will only kill the process with the highest
memory usage by selecting process with the highest oom_badness on the
entire system.
This works fine on UMA system, but may have some accidental killing on NUMA
system.
As shown below, if process c.out is bind to Node1 and keep allocating pages
from Node1, a.out will be killed first. But killing a.out did't free any
mem on Node1, so c.out will be killed then.
A lot of AMD machines have 8 numa nodes. In these systems, there is a
greater chance of triggering this problem.
Please be more specific about existing usecases which suffer from the
current OOM handling limitations.
I was just going through the mail list and happen to see this. There
is another usecase for us about per-numa memory usage.
Say we have several important latency-critical services sitting inside
different NUMA nodes without intersection. The need for memory of these
LC services varies, so the free memory of each node is also different.
Then we launch several background containers without cpuset constrains
to eat the left resources. Now the problem is that there doesn't seem
like a proper memory policy available to balance the usage between the
nodes, which could lead to memory-heavy LC services suffer from high
memory pressure and fails to meet the SLOs.
It's quite appreciated if you can shed some light on this!
Thanks & BR,
Abel