[RFC PATCH v2 5/5] bpf: Add a BPF OOM policy Doc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This patch adds a new doc Documentation/bpf/oom.rst to describe how
BPF OOM policy is supposed to work.

Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx>
---
 Documentation/bpf/oom.rst | 70 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100644 Documentation/bpf/oom.rst

diff --git a/Documentation/bpf/oom.rst b/Documentation/bpf/oom.rst
new file mode 100644
index 000000000000..9bad1fd30d4a
--- /dev/null
+++ b/Documentation/bpf/oom.rst
@@ -0,0 +1,70 @@
+=============
+BPF OOM Policy
+=============
+
+The Out Of Memory Killer (aka OOM Killer) is invoked when the system is
+critically low on memory. The in-kernel implementation is to iterate over
+all tasks in the specific oom domain (all tasks for global and all members
+of memcg tree for hard limit oom) and select a victim based some heuristic
+policy to kill.
+
+Specifically:
+
+1. Begin to iterate tasks using ``oom_evaluate_task()`` and find a valid (killable)
+   victim in iteration N, select it.
+
+2. In iteration N + 1, N + 2..., we compare the current iteration task with the
+   previous selected task, if current is more suitable then select it.
+
+3. finally we get a victim to kill.
+
+However, this does not meet the needs of users in some special scenarios. Using
+the eBPF capabilities, We can implement customized OOM policies to meet needs.
+
+Developer API:
+==================
+
+bpf_oom_evaluate_task
+----------------------
+
+``bpf_oom_evaluate_task`` is a new interface hooking into ``oom_evaluate_task()``
+which is used to bypass the in-kernel selection logic. Users can customize their
+victim selection policy through BPF programs attached to it.
+::
+
+    int bpf_oom_evaluate_task(struct task_struct *task,
+                                struct oom_control *oc);
+
+return value::
+
+    NO_BPF_POLICY     no bpf policy and would fallback to the in-kernel selection
+    BPF_EVAL_ABORT    abort the selection (exit from current selection loop)
+    BPF_EVAL_NEXT     ignore the task
+    BPF_EAVL_SELECT   select the current task
+
+Suppose we want to select a victim based on the specified pid when OOM is
+invoked, we can use the following BPF program::
+
+    SEC("fmod_ret/bpf_oom_evaluate_task")
+    int BPF_PROG(bpf_oom_evaluate_task, struct task_struct *task, struct oom_control *oc)
+    {
+        if (task->pid == target_pid)
+            return BPF_EAVL_SELECT;
+        return BPF_EVAL_NEXT;
+    }
+
+bpf_set_policy_name
+---------------------
+
+``bpf_set_policy_name`` is a interface hooking before the start of victim selection. We can
+set policy's name in the attached program, so dump_header() can identify different policies
+when reporting messages. We can set policy's name through kfunc ``set_oom_policy_name``
+::
+
+    SEC("fentry/bpf_set_policy_name")
+    int BPF_PROG(set_police_name_k, struct oom_control *oc)
+    {
+	    char name[] = "my_policy";
+	    set_oom_policy_name(oc, name, sizeof(name));
+	    return 0;
+    }
\ No newline at end of file
-- 
2.20.1





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux