On Wed, Aug 16, 2023 at 7:51 PM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote: > > Hello, > > 在 2023/8/17 10:07, Alexei Starovoitov 写道: > > On Thu, Aug 10, 2023 at 1:13 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote: > >> static int oom_evaluate_task(struct task_struct *task, void *arg) > >> { > >> struct oom_control *oc = arg; > >> @@ -317,6 +339,26 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) > >> if (!is_memcg_oom(oc) && !oom_cpuset_eligible(task, oc)) > >> goto next; > >> > >> + /* > >> + * If task is allocating a lot of memory and has been marked to be > >> + * killed first if it triggers an oom, then select it. > >> + */ > >> + if (oom_task_origin(task)) { > >> + points = LONG_MAX; > >> + goto select; > >> + } > >> + > >> + switch (bpf_oom_evaluate_task(task, oc)) { > >> + case BPF_EVAL_ABORT: > >> + goto abort; /* abort search process */ > >> + case BPF_EVAL_NEXT: > >> + goto next; /* ignore the task */ > >> + case BPF_EVAL_SELECT: > >> + goto select; /* select the task */ > >> + default: > >> + break; /* No BPF policy */ > >> + } > >> + > > > > I think forcing bpf prog to look at every task is going to be limiting > > long term. > > It's more flexible to invoke bpf prog from out_of_memory() > > and if it doesn't choose a task then fallback to select_bad_process(). > > I believe that's what Roman was proposing. > > bpf can choose to iterate memcg or it might have some side knowledge > > that there are processes that can be set as oc->chosen right away, > > so it can skip the iteration. > > IIUC, We may need some new bpf features if we want to iterating > tasks/memcg in BPF, sush as: > bpf_for_each_task > bpf_for_each_memcg > bpf_for_each_task_in_memcg > ... > > It seems we have some work to do first in the BPF side. > Will these iterating features be useful in other BPF scenario except OOM > Policy? Yes. Use open coded iterators though. Like example in https://lore.kernel.org/all/20230810183513.684836-4-davemarchevsky@xxxxxx/ bpf_for_each(task_vma, vma, task, 0) { ... } will safely iterate vma-s of the task. Similarly struct css_task_iter can be hidden inside bpf open coded iterator.