Hello Al, On Fri, Mar 11, 2022 at 7:46 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Fri, Feb 25, 2022 at 03:43:31PM -0800, Hao Luo wrote: > > This patch allows bpf_syscall prog to perform some basic filesystem > > operations: create, remove directories and unlink files. Three bpf > > helpers are added for this purpose. When combined with the following > > patches that allow pinning and getting bpf objects from bpf prog, > > this feature can be used to create directory hierarchy in bpffs that > > help manage bpf objects purely using bpf progs. > > > > The added helpers subject to the same permission checks as their syscall > > version. For example, one can not write to a read-only file system; > > The identity of the current process is checked to see whether it has > > sufficient permission to perform the operations. > > > > Only directories and files in bpffs can be created or removed by these > > helpers. But it won't be too hard to allow these helpers to operate > > on files in other filesystems, if we want. > > In which contexts can those be called? > In a sleepable context. The plan is to introduce a certain tracepoints as sleepable, a program that attaches to sleepable tracepoints is allowed to call these functions. In particular, the first sleepable tracepoint introduced in this patchset is one at the end of cgroup_mkdir(). Do you have any advices? > > +BPF_CALL_2(bpf_rmdir, const char *, pathname, int, pathname_sz) > > +{ > > + struct user_namespace *mnt_userns; > > + struct path parent; > > + struct dentry *dentry; > > + int err; > > + > > + if (pathname_sz <= 1 || pathname[pathname_sz - 1]) > > + return -EINVAL; > > + > > + err = kern_path(pathname, 0, &parent); > > + if (err) > > + return err; > > + > > + if (!bpf_path_is_bpf_dir(&parent)) { > > + err = -EPERM; > > + goto exit1; > > + } > > + > > + err = mnt_want_write(parent.mnt); > > + if (err) > > + goto exit1; > > + > > + dentry = kern_path_locked(pathname, &parent); > > This can't be right. Ever. There is no promise whatsoever > that these two lookups will resolve to the same place. > > > +BPF_CALL_2(bpf_unlink, const char *, pathname, int, pathname_sz) > > +{ > > + struct user_namespace *mnt_userns; > > + struct path parent; > > + struct dentry *dentry; > > + struct inode *inode = NULL; > > + int err; > > + > > + if (pathname_sz <= 1 || pathname[pathname_sz - 1]) > > + return -EINVAL; > > + > > + err = kern_path(pathname, 0, &parent); > > + if (err) > > + return err; > > + > > + err = mnt_want_write(parent.mnt); > > + if (err) > > + goto exit1; > > + > > + dentry = kern_path_locked(pathname, &parent); > > + if (IS_ERR(dentry)) { > > + err = PTR_ERR(dentry); > > + goto exit2; > > + } > > Ditto. NAK; if you want to poke into fs/namei.c guts, do it right. > Or at least discuss that on fsdevel. As it is, it's completely broken. > It's racy *and* it blatantly leaks both vfsmount and dentry references. > > NAKed-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Thanks Al for taking a look. Actually, there is a simpler approach: can we export two functions in namei.c that wrap call to do_mkdirat and do_unlinkat, but take a kernel string as pathname? Then these two bpf helpers can use them, don't have to be this complicated. Does this sound good to you? Thanks!