On Sun, Nov 03, 2019 at 02:54:17AM -0500, Wenbo Zhang wrote: > When people want to identify which file system files are being opened, > read, and written to, they can use this helper with file descriptor as > input to achieve this goal. Other pseudo filesystems are also supported. > > This requirement is mainly discussed here: > > https://github.com/iovisor/bcc/issues/237 > > v6->v7: > - fix missing signed-off-by line > > v5->v6: addressed Andrii's feedback > - avoid unnecessary goto end by having two explicit returns > > v4->v5: addressed Andrii and Daniel's feedback > - rename bpf_fd2path to bpf_get_file_path to be consistent with other > helper's names > - when fdget_raw fails, set ret to -EBADF instead of -EINVAL > - remove fdput from fdget_raw's error path > - use IS_ERR instead of IS_ERR_OR_NULL as d_path ether returns a pointer > into the buffer or an error code if the path was too long > - modify the normal path's return value to return copied string length > including NUL > - update this helper description's Return bits. > > v3->v4: addressed Daniel's feedback > - fix missing fdput() > - move fd2path from kernel/bpf/trace.c to kernel/trace/bpf_trace.c > - move fd2path's test code to another patch > - add comment to explain why use fdget_raw instead of fdget > > v2->v3: addressed Yonghong's feedback > - remove unnecessary LOCKDOWN_BPF_READ > - refactor error handling section for enhanced readability > - provide a test case in tools/testing/selftests/bpf > > v1->v2: addressed Daniel's feedback > - fix backward compatibility > - add this helper description > - fix signed-off name > > Signed-off-by: Wenbo Zhang <ethercflow@xxxxxxxxx> > --- > include/uapi/linux/bpf.h | 15 ++++++++++- > kernel/trace/bpf_trace.c | 48 ++++++++++++++++++++++++++++++++++ > tools/include/uapi/linux/bpf.h | 15 ++++++++++- > 3 files changed, 76 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index a6bf19dabaab..d618a914c6fe 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -2777,6 +2777,18 @@ union bpf_attr { > * restricted to raw_tracepoint bpf programs. > * Return > * 0 on success, or a negative error in case of failure. > + * > + * int bpf_get_file_path(char *path, u32 size, int fd) > + * Description > + * Get **file** atrribute from the current task by *fd*, then call > + * **d_path** to get it's absolute path and copy it as string into > + * *path* of *size*. The **path** also support pseudo filesystems > + * (whether or not it can be mounted). The *size* must be strictly > + * positive. On success, the helper makes sure that the *path* is > + * NUL-terminated. On failure, it is filled with zeroes. > + * Return > + * On success, returns the length of the copied string INCLUDING > + * the trailing NUL, or a negative error in case of failure. > */ > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ > @@ -2890,7 +2902,8 @@ union bpf_attr { > FN(sk_storage_delete), \ > FN(send_signal), \ > FN(tcp_gen_syncookie), \ > - FN(skb_output), > + FN(skb_output), \ > + FN(get_file_path), > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > * function eBPF program intends to call > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index f50bf19f7a05..41be1c5989af 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -683,6 +683,52 @@ static const struct bpf_func_proto bpf_send_signal_proto = { > .arg1_type = ARG_ANYTHING, > }; > > +BPF_CALL_3(bpf_get_file_path, char *, dst, u32, size, int, fd) > +{ > + struct fd f; > + char *p; > + int ret = -EBADF; > + > + /* Use fdget_raw instead of fdget to support O_PATH, and > + * fdget_raw doesn't have any sleepable code, so it's ok > + * to be here. > + */ > + f = fdget_raw(fd); > + if (!f.file) > + goto error; > + > + /* d_path doesn't have any sleepable code, so it's ok to > + * be here. But it uses the current macro to get fs_struct > + * (current->fs). So this helper shouldn't be called in > + * interrupt context. > + */ > + p = d_path(&f.file->f_path, dst, size); > + if (IS_ERR(p)) { > + ret = PTR_ERR(p); > + fdput(f); > + goto error; > + } This is definitely very useful helper that bpf tracing community has been asking for long time, but I have few concerns with implementation: - fdget_raw is only used inside fs/, so it doesn't look right to skip the layers. - accessing current->fs is not always correct, so the code should somehow check that it's ok to do so, but I'm not sure if (in_irq()) would be enough. - some implementations of d_dname do sleep. For example: dmabuffs_dname. Though it seems to me that it's a bug in that particular FS. But I'd like to hear clear yes from VFS experts that fdget_raw() + d_path() is ok from preempt_disabled section. The other alternative is to wait for sleepable and preemptible BPF programs to appear. Which is probably a month or so away. Then all these issues will disappear.