Re: [PATCH bpf-next 1/3] bpf: support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 16 May 2023 11:02:56 -0700

On Tue, May 16, 2023 at 1:52 AM Jiri Olsa <olsajiri@xxxxxxxxx> wrote:
>
> On Mon, May 15, 2023 at 05:13:46PM -0700, Andrii Nakryiko wrote:
> > Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall
> > forces users to specify pinning location as a string-based absolute or
> > relative (to current working directory) path. This has various
> > implications related to security (e.g., symlink-based attacks), forces
> > BPF FS to be exposed in the file system, which can cause races with
> > other applications.
> >
> > One of the feedbacks we got from folks working with containers heavily
> > was that inability to use purely FD-based location specification was an
> > unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET
> > commands. This patch closes this oversight, adding path_fd field to
> > BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by
> > *at() syscalls for dirfd + pathname combinations.
> >
> > This now allows interesting possibilities like working with detached BPF
> > FS mount (e.g., to perform multiple pinnings without running a risk of
> > someone interfering with them), and generally making pinning/getting
> > more secure and not prone to any races and/or security attacks.
> >
> > This is demonstrated by a selftest added in subsequent patch that takes
> > advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate
> > creating detached BPF FS mount, pinning, and then getting BPF map out of
> > it, all while never exposing this private instance of BPF FS to outside
> > worlds.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > ---
> >  include/linux/bpf.h            |  4 ++--
> >  include/uapi/linux/bpf.h       |  5 +++++
> >  kernel/bpf/inode.c             | 16 ++++++++--------
> >  kernel/bpf/syscall.c           |  8 +++++---
> >  tools/include/uapi/linux/bpf.h |  5 +++++
> >  5 files changed, 25 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 36e4b2d8cca2..f58895830ada 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -2077,8 +2077,8 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
> >  struct bpf_link *bpf_link_get_from_fd(u32 ufd);
> >  struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
> >
> > -int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
> > -int bpf_obj_get_user(const char __user *pathname, int flags);
> > +int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
> > +int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
> >
> >  #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
> >  #define DEFINE_BPF_ITER_FUNC(target, args...)                        \
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 1bb11a6ee667..db2870a52ce0 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -1420,6 +1420,11 @@ union bpf_attr {
> >               __aligned_u64   pathname;
> >               __u32           bpf_fd;
> >               __u32           file_flags;
> > +             /* same as dirfd in openat() syscall; see openat(2)
> > +              * manpage for details of dirfd/path_fd and pathname semantics;
> > +              * zero path_fd implies AT_FDCWD behavior
> > +              */
> > +             __u32           path_fd;
>
> I'd probably call it dir_fd to emphasize the similarity,
> but I don't mind path_fd as well

I considered that, but it's really not necessarily a directory, it
could be a specific file location (with O_PATH), so I felt like a more
generic "path_fd" would be better (plus we have *path*name to combine
with). It's minor, I can be convinced if others feel strongly about
this.

>
> I have a note that you suggested to introduce this for uprobe
> multi link as well, so I'll do something similar
>
> lgtm
>
> Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx>
>
> jirka
>
> >       };
> >
> >       struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
> > diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> > index 9948b542a470..13bb54f6bd17 100644
> > --- a/kernel/bpf/inode.c
> > +++ b/kernel/bpf/inode.c

[...]