On Mon, May 22, 2023 at 04:29:15PM -0700, Andrii Nakryiko wrote: > Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall > forces users to specify pinning location as a string-based absolute or > relative (to current working directory) path. This has various > implications related to security (e.g., symlink-based attacks), forces > BPF FS to be exposed in the file system, which can cause races with > other applications. > > One of the feedbacks we got from folks working with containers heavily > was that inability to use purely FD-based location specification was an > unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET > commands. This patch closes this oversight, adding path_fd field to > BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by > *at() syscalls for dirfd + pathname combinations. > > This now allows interesting possibilities like working with detached BPF > FS mount (e.g., to perform multiple pinnings without running a risk of > someone interfering with them), and generally making pinning/getting > more secure and not prone to any races and/or security attacks. > > This is demonstrated by a selftest added in subsequent patch that takes > advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate > creating detached BPF FS mount, pinning, and then getting BPF map out of > it, all while never exposing this private instance of BPF FS to outside > worlds. > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > --- Reviewed-by: Christian Brauner <brauner@xxxxxxxxxx>