Re: [RFC PATCH 1/2] mm, oom: Introduce bpf_select_task

Michal Hocko <mhocko@xxxxxxxx> · Mon, 7 Aug 2023 10:32:34 +0200

On Sat 05-08-23 07:55:56, Chuyi Zhou wrote:
> Hello,
> 
> 在 2023/8/4 19:34, Alan Maguire 写道:
[...]
> > I don't know anything about OOM mechanisms, so maybe it's just me, but I
> > found this confusing. Relying on the previous iteration to control
> > current iteration behaviour seems risky - even if BPF found a victim in
> > iteration N, it's no guarantee it will in iteration N+1.
> > 
> The current kernel's OOM actually works like this:
> 
> 1. if we first find a valid candidate victim A in iteration N, we would
> record it in oc->chosen.
> 
> 2. In iteration N + 1, N+2..., we just compare oc->chosen with the current
> iterating task. Suppose we think current task B is better than
> oc->chosen(A), we would set oc->chosen = B and we would not consider A
> anymore.
> 
> IIUC, most policy works like this. We just need to find the *most* suitable
> victim. Normally, if in current iteration we drop A and select B, we would
> not consider A anymore.

Yes, we iterate over all tasks in the specific oom domain (all tasks for
global and all members of memcg tree for hard limit oom). The in-tree
oom policy has to iterate all tasks to achieve some of its goals (like
preventing overkilling while the previously selected victim is still on
the way out). Also oom_score_adj might change the final decision so you
have to really check all eligible tasks.

I can imagine a BPF based policy could be less constrained and as Roman
suggested have a pre-selected victims on stand by. I do not see problem
to have break like mode. Similar to current abort without a canceling an
already noted victim.
-- 
Michal Hocko
SUSE Labs