On Tue, May 16, 2023 at 1:52 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: > > On Mon, May 15, 2023 at 05:13:46PM -0700, Andrii Nakryiko wrote: > > Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall > > forces users to specify pinning location as a string-based absolute or > > relative (to current working directory) path. This has various > > implications related to security (e.g., symlink-based attacks), forces > > BPF FS to be exposed in the file system, which can cause races with > > other applications. > > > > One of the feedbacks we got from folks working with containers heavily > > was that inability to use purely FD-based location specification was an > > unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET > > commands. This patch closes this oversight, adding path_fd field to > > BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by > > *at() syscalls for dirfd + pathname combinations. > > > > This now allows interesting possibilities like working with detached BPF > > FS mount (e.g., to perform multiple pinnings without running a risk of > > someone interfering with them), and generally making pinning/getting > > more secure and not prone to any races and/or security attacks. > > > > This is demonstrated by a selftest added in subsequent patch that takes > > advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate > > creating detached BPF FS mount, pinning, and then getting BPF map out of > > it, all while never exposing this private instance of BPF FS to outside > > worlds. > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > --- > > include/linux/bpf.h | 4 ++-- > > include/uapi/linux/bpf.h | 5 +++++ > > kernel/bpf/inode.c | 16 ++++++++-------- > > kernel/bpf/syscall.c | 8 +++++--- > > tools/include/uapi/linux/bpf.h | 5 +++++ > > 5 files changed, 25 insertions(+), 13 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index 36e4b2d8cca2..f58895830ada 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -2077,8 +2077,8 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd); > > struct bpf_link *bpf_link_get_from_fd(u32 ufd); > > struct bpf_link *bpf_link_get_curr_or_next(u32 *id); > > > > -int bpf_obj_pin_user(u32 ufd, const char __user *pathname); > > -int bpf_obj_get_user(const char __user *pathname, int flags); > > +int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname); > > +int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags); > > > > #define BPF_ITER_FUNC_PREFIX "bpf_iter_" > > #define DEFINE_BPF_ITER_FUNC(target, args...) \ > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 1bb11a6ee667..db2870a52ce0 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -1420,6 +1420,11 @@ union bpf_attr { > > __aligned_u64 pathname; > > __u32 bpf_fd; > > __u32 file_flags; > > + /* same as dirfd in openat() syscall; see openat(2) > > + * manpage for details of dirfd/path_fd and pathname semantics; > > + * zero path_fd implies AT_FDCWD behavior > > + */ > > + __u32 path_fd; > > I'd probably call it dir_fd to emphasize the similarity, > but I don't mind path_fd as well I considered that, but it's really not necessarily a directory, it could be a specific file location (with O_PATH), so I felt like a more generic "path_fd" would be better (plus we have *path*name to combine with). It's minor, I can be convinced if others feel strongly about this. > > I have a note that you suggested to introduce this for uprobe > multi link as well, so I'll do something similar > > lgtm > > Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx> > > jirka > > > }; > > > > struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */ > > diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c > > index 9948b542a470..13bb54f6bd17 100644 > > --- a/kernel/bpf/inode.c > > +++ b/kernel/bpf/inode.c [...]