On Thu, Aug 10, 2023 at 1:13 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote: > static int oom_evaluate_task(struct task_struct *task, void *arg) > { > struct oom_control *oc = arg; > @@ -317,6 +339,26 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) > if (!is_memcg_oom(oc) && !oom_cpuset_eligible(task, oc)) > goto next; > > + /* > + * If task is allocating a lot of memory and has been marked to be > + * killed first if it triggers an oom, then select it. > + */ > + if (oom_task_origin(task)) { > + points = LONG_MAX; > + goto select; > + } > + > + switch (bpf_oom_evaluate_task(task, oc)) { > + case BPF_EVAL_ABORT: > + goto abort; /* abort search process */ > + case BPF_EVAL_NEXT: > + goto next; /* ignore the task */ > + case BPF_EVAL_SELECT: > + goto select; /* select the task */ > + default: > + break; /* No BPF policy */ > + } > + I think forcing bpf prog to look at every task is going to be limiting long term. It's more flexible to invoke bpf prog from out_of_memory() and if it doesn't choose a task then fallback to select_bad_process(). I believe that's what Roman was proposing. bpf can choose to iterate memcg or it might have some side knowledge that there are processes that can be set as oc->chosen right away, so it can skip the iteration.