On 3/22/21 11:39 AM, Roman Gushchin wrote:
On Sat, Mar 20, 2021 at 10:02:01AM -0700, Yonghong Song wrote:
Jiri Olsa reported a bug ([1]) in kernel where cgroup local
storage pointer may be NULL in bpf_get_local_storage() helper.
There are two issues uncovered by this bug:
(1). kprobe or tracepoint prog incorrectly sets cgroup local storage
before prog run,
(2). due to change from preempt_disable to migrate_disable,
preemption is possible and percpu storage might be overwritten
by other tasks.
This issue (1) is fixed in [2]. This patch tried to address issue (2).
The following shows how things can go wrong:
task 1: bpf_cgroup_storage_set() for percpu local storage
preemption happens
task 2: bpf_cgroup_storage_set() for percpu local storage
preemption happens
task 1: run bpf program
task 1 will effectively use the percpu local storage setting by task 2
which will be either NULL or incorrect ones.
Instead of just one common local storage per cpu, this patch fixed
the issue by permitting 8 local storages per cpu and each local
storage is identified by a task_struct pointer. This way, we
allow at most 8 nested preemption between bpf_cgroup_storage_set()
and bpf_cgroup_storage_unset(). The percpu local storage slot
is released (calling bpf_cgroup_storage_unset()) by the same task
after bpf program finished running.
bpf_test_run() is also fixed to use the new bpf_cgroup_storage_set()
interface.
The patch is tested on top of [2] with reproducer in [1].
Without this patch, kernel will emit error in 2-3 minutes.
With this patch, after one hour, still no error.
[1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@xxxxxxxxxxxxxx/T
[2] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@xxxxxxxxxxxxxx/T
Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
Cc: Roman Gushchin <guro@xxxxxx>
Signed-off-by: Yonghong Song <yhs@xxxxxx>
It looks a bit artificial (8 storages to handle the nesting), but most likely
it will work in the real life and the code looks correct to me.
Indeed. I picked 8 so it is large enough to cover "all" realistic cases.
Please, feel free to add
Acked-by: Roman Gushchin <guro@xxxxxx>
and thank you for fixing it!
Btw, is it intended for a stable backport?
This patch cannbe be used for backport as is. The test_run.c in
bpf-next has a refactoring which is quite different from bpf tree.
If we want to backport to bpf/stable, we will need a different
patch. We can address this later.
Thanks!
---
include/linux/bpf-cgroup.h | 57 ++++++++++++++++++++++++++++++++------
include/linux/bpf.h | 22 ++++++++++++---
kernel/bpf/helpers.c | 15 +++++++---
kernel/bpf/local_storage.c | 5 ++--
net/bpf/test_run.c | 6 +++-
5 files changed, 86 insertions(+), 19 deletions(-)
[...]