On Fri, Feb 25, 2022 at 03:43:31PM -0800, Hao Luo wrote: > This patch allows bpf_syscall prog to perform some basic filesystem > operations: create, remove directories and unlink files. Three bpf > helpers are added for this purpose. When combined with the following > patches that allow pinning and getting bpf objects from bpf prog, > this feature can be used to create directory hierarchy in bpffs that > help manage bpf objects purely using bpf progs. > > The added helpers subject to the same permission checks as their syscall > version. For example, one can not write to a read-only file system; > The identity of the current process is checked to see whether it has > sufficient permission to perform the operations. > > Only directories and files in bpffs can be created or removed by these > helpers. But it won't be too hard to allow these helpers to operate > on files in other filesystems, if we want. In which contexts can those be called? > +BPF_CALL_2(bpf_rmdir, const char *, pathname, int, pathname_sz) > +{ > + struct user_namespace *mnt_userns; > + struct path parent; > + struct dentry *dentry; > + int err; > + > + if (pathname_sz <= 1 || pathname[pathname_sz - 1]) > + return -EINVAL; > + > + err = kern_path(pathname, 0, &parent); > + if (err) > + return err; > + > + if (!bpf_path_is_bpf_dir(&parent)) { > + err = -EPERM; > + goto exit1; > + } > + > + err = mnt_want_write(parent.mnt); > + if (err) > + goto exit1; > + > + dentry = kern_path_locked(pathname, &parent); This can't be right. Ever. There is no promise whatsoever that these two lookups will resolve to the same place. > +BPF_CALL_2(bpf_unlink, const char *, pathname, int, pathname_sz) > +{ > + struct user_namespace *mnt_userns; > + struct path parent; > + struct dentry *dentry; > + struct inode *inode = NULL; > + int err; > + > + if (pathname_sz <= 1 || pathname[pathname_sz - 1]) > + return -EINVAL; > + > + err = kern_path(pathname, 0, &parent); > + if (err) > + return err; > + > + err = mnt_want_write(parent.mnt); > + if (err) > + goto exit1; > + > + dentry = kern_path_locked(pathname, &parent); > + if (IS_ERR(dentry)) { > + err = PTR_ERR(dentry); > + goto exit2; > + } Ditto. NAK; if you want to poke into fs/namei.c guts, do it right. Or at least discuss that on fsdevel. As it is, it's completely broken. It's racy *and* it blatantly leaks both vfsmount and dentry references. NAKed-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>