On 2/24/22 8:10 PM, Song Liu wrote:
On Thu, Feb 24, 2022 at 8:04 PM Yonghong Song <yhs@xxxxxx> wrote:
On 2/24/22 1:49 PM, Yucong Sun wrote:
In a previous commit (1), BPF preload process was switched from user
mode process to use in-kernel light skeleton instead. However, in the
kernel context the available fd starts from 0, instead of normally 3 for
user mode process. and the preload process leaked two FDs, taking over
FD 0 and 1. This which later caused issues when kernel trys to setup
stdin/stdout/stderr for init process, assuming fd 0,1,2 is available.
As seen here:
Before fix:
ls -lah /proc/1/fd/*
lrwx------1 root root 64 Feb 23 17:20 /proc/1/fd/0 -> /dev/null
lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/1 -> /dev/null
lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/2 -> /dev/console
lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/6 -> /dev/console
lrwx------ 1 root root 64 Feb 23 17:20 /proc/1/fd/7 -> /dev/console
After Fix / Normal:
ls -lah /proc/1/fd/*
lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/0 -> /dev/console
lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/1 -> /dev/console
lrwx------ 1 root root 64 Feb 24 21:23 /proc/1/fd/2 -> /dev/console
In this patch:
- skel_closenz was changed to skel_closenez to correctly handle
FD=0 case.
- various places detecting FD > 0 was changed to FD >= 0.
- Call iterators_skel__detach() funciton to release FDs after links
are obtained.
1: https://github.com/kernel-patches/bpf/commit/cb80ddc67152e72f28ff6ea8517acdf875d7381d
Signed-off-by: Yucong Sun <fallentree@xxxxxx>
---
kernel/bpf/preload/bpf_preload_kern.c | 1 +
kernel/bpf/preload/iterators/iterators.lskel.h | 16 +++++++++-------
tools/bpf/bpftool/gen.c | 9 +++++----
tools/lib/bpf/skel_internal.h | 8 ++++----
4 files changed, 19 insertions(+), 15 deletions(-)
diff --git a/kernel/bpf/preload/bpf_preload_kern.c b/kernel/bpf/preload/bpf_preload_kern.c
index 30207c048d36..c6bb1e72e0f1 100644
--- a/kernel/bpf/preload/bpf_preload_kern.c
+++ b/kernel/bpf/preload/bpf_preload_kern.c
@@ -54,6 +54,7 @@ static int load_skel(void)
err = PTR_ERR(progs_link);
goto out;
}
+ iterators_bpf__detach(skel);
In fini, we have:
static void __exit fini(void)
{
bpf_preload_ops = NULL;
free_links_and_skel();
}
static void free_links_and_skel(void)
{
if (!IS_ERR_OR_NULL(maps_link))
bpf_link_put(maps_link);
if (!IS_ERR_OR_NULL(progs_link))
bpf_link_put(progs_link);
iterators_bpf__destroy(skel);
}
Since you did iterators_bpf__detach(skel) in load_skel(),
in fini(), we don't need iterators_bpf__destroy(skel), right?
iterators_bpf__destroy() still cleans up some other things, so
I guess we should just keep it?
Ya, we do need iterators_bpf__destroy(). My mistake. I mean
we don't need iterators_bpf__detach() inside iterators_bpf__destroy().
But since it is inside iterators_bpf__destroy(), to create
a separate function like iterators_bpf__destroy() but
without iterators_bpf__detach(skel) seems a overkill.
Maybe add a comment for free_links_and_skel() to indicate
iterators_bpf__detach(skel) is already done and repeated detach
call inside iterators_bpf__destroy() is a noop.
Otherwise, people may confuse whether we have layering
violation or not here.
static void
iterators_bpf__destroy(struct iterators_bpf *skel)
{
if (!skel)
return;
iterators_bpf__detach(skel);
skel_closenz(skel->progs.dump_bpf_map.prog_fd);
skel_closenz(skel->progs.dump_bpf_prog.prog_fd);
skel_free_map_data(skel->rodata, skel->maps.rodata.initial_value, 4096);
skel_closenz(skel->maps.rodata.map_fd);
skel_free(skel);
}