On Thu, 31 May 2018 at 16:52, Tycho Andersen <tycho@xxxxxxxx> wrote: > > The idea here is that the userspace handler should be able to pass an fd > back to the trapped task, for example so it can be returned from socket(). > > I've proposed one API here, but I'm open to other options. In particular, > this only lets you return an fd from a syscall, which may not be enough in > all cases. For example, if an fd is written to an output parameter instead > of returned, the current API can't handle this. Another case is that > netlink takes as input fds sometimes (IFLA_NET_NS_FD, e.g.). If netlink > ever decides to install an fd and output it, we wouldn't be able to handle > this either. > > Still, the vast majority of interesting cases are covered by this API, so > perhaps it is Enough. > > I've left it as a separate commit for two reasons: > * It illustrates the way in which we would grow struct seccomp_notif and > struct seccomp_notif_resp without using netlink > * It shows just how little code is needed to accomplish this :) > > v2: new in v2 > v3: no changes > > Signed-off-by: Tycho Andersen <tycho@xxxxxxxx> > CC: Kees Cook <keescook@xxxxxxxxxxxx> > CC: Andy Lutomirski <luto@xxxxxxxxxxxxxx> > CC: Oleg Nesterov <oleg@xxxxxxxxxx> > CC: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> > CC: "Serge E. Hallyn" <serge@xxxxxxxxxx> > CC: Christian Brauner <christian.brauner@xxxxxxxxxx> > CC: Tyler Hicks <tyhicks@xxxxxxxxxxxxx> > CC: Akihiro Suda <suda.akihiro@xxxxxxxxxxxxx> > --- > include/uapi/linux/seccomp.h | 2 + > kernel/seccomp.c | 49 +++++++- > tools/testing/selftests/seccomp/seccomp_bpf.c | 112 ++++++++++++++++++ > 3 files changed, 161 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h > index 8160e6cad528..3124427219cb 100644 > --- a/include/uapi/linux/seccomp.h > +++ b/include/uapi/linux/seccomp.h > @@ -71,6 +71,8 @@ struct seccomp_notif_resp { > __u64 id; > __s32 error; > __s64 val; > + __u8 return_fd; > + __u32 fd; > }; > > #endif /* _UAPI_LINUX_SECCOMP_H */ > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index 6dc99c65c2f4..2ee958b3efde 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -77,6 +77,8 @@ struct seccomp_knotif { > /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */ > int error; > long val; > + struct file *file; > + unsigned int flags; > > /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */ > struct completion ready; > @@ -780,10 +782,32 @@ static void seccomp_do_user_notification(int this_syscall, > goto remove_list; > } > > - ret = n.val; > - err = n.error; > + if (n.file) { > + int fd; > + > + fd = get_unused_fd_flags(n.flags); > + if (fd < 0) { > + err = fd; > + ret = -1; > + goto remove_list; > + } > + > + ret = fd; > + err = 0; > + > + fd_install(fd, n.file); > + /* Don't fput, since fd has a reference now */ > + n.file = NULL; Do we want the cgroup classid and netprio to be applied here, before the fd_install? I am looking at similar code in net/core/scm.c scm_detach_fds(): sock = sock_from_file(fp[i], &err); if (sock) { sock_update_netprioidx(&sock->sk->sk_cgrp_data); sock_update_classid(&sock->sk->sk_cgrp_data); } The listener process might live in a different cgroup with a different classid & netprio, so it might not be applied as the app might expect. Also, should we update the struct sock_cgroup_data of the socket, in order to make the BPF helper function bpf_skb_under_cgroup() work wrt the cgroup of the target process instead of the listener process? I am looking at cgroup_sk_alloc(). I don't know what's the correct behaviour we want here. > + } else { > + ret = n.val; > + err = n.error; > + } > + > > remove_list: > + if (n.file) > + fput(n.file); > + > list_del(&n.list); > out: > mutex_unlock(&match->notify_lock); > @@ -1598,6 +1622,27 @@ static ssize_t seccomp_notify_write(struct file *file, const char __user *buf, > knotif->state = SECCOMP_NOTIFY_REPLIED; > knotif->error = resp.error; > knotif->val = resp.val; > + > + if (resp.return_fd) { > + struct fd fd; > + > + /* > + * This is a little hokey: we need a real fget() (i.e. not > + * __fget_light(), which is what fdget does), but we also need > + * the flags from strcut fd. So, we get it, put it, and get it > + * again for real. > + */ > + fd = fdget(resp.fd); > + knotif->flags = fd.flags; > + fdput(fd); > + > + knotif->file = fget(resp.fd); > + if (!knotif->file) { > + ret = -EBADF; > + goto out; Should the "knotif->state = SECCOMP_NOTIFY_REPLIED" and other changes be done after the error case here? In case of bad fd, it seems to return -EBADF the first time and -EINVAL the next time because the state would have been changed to SECCOMP_NOTIFY_REPLIED already. > + } > + } > + > complete(&knotif->ready); > out: > mutex_unlock(&filter->notify_lock); > diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c > index 07984f7bd601..b9a4f676566d 100644 > --- a/tools/testing/selftests/seccomp/seccomp_bpf.c > +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c > @@ -167,6 +167,8 @@ struct seccomp_notif_resp { > __u64 id; > __s32 error; > __s64 val; > + __u8 return_fd; > + __u32 fd; > }; > #endif > > @@ -3176,6 +3178,116 @@ TEST(get_user_notification_ptrace) > close(listener); > } > > +TEST(user_notification_pass_fd) > +{ > + pid_t pid; > + int status, listener; > + int sk_pair[2]; > + char c; > + struct seccomp_notif req = {}; > + struct seccomp_notif_resp resp = {}; > + long ret; > + > + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); > + > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + if (pid == 0) { > + int fd; > + char buf[16]; > + > + EXPECT_EQ(user_trap_syscall(__NR_getpid, 0), 0); > + > + /* Signal we're ready and have installed the filter. */ > + EXPECT_EQ(write(sk_pair[1], "J", 1), 1); > + > + EXPECT_EQ(read(sk_pair[1], &c, 1), 1); > + EXPECT_EQ(c, 'H'); > + close(sk_pair[1]); > + > + /* An fd from getpid(). Let the games begin. */ > + fd = syscall(__NR_getpid); > + EXPECT_GT(fd, 0); > + EXPECT_EQ(read(fd, buf, sizeof(buf)), 12); > + close(fd); > + > + exit(strcmp("hello world", buf)); > + } > + > + EXPECT_EQ(read(sk_pair[0], &c, 1), 1); > + EXPECT_EQ(c, 'J'); > + > + EXPECT_EQ(ptrace(PTRACE_ATTACH, pid), 0); > + EXPECT_EQ(waitpid(pid, NULL, 0), pid); > + listener = ptrace(PTRACE_SECCOMP_GET_LISTENER, pid, 0); > + EXPECT_GE(listener, 0); > + EXPECT_EQ(ptrace(PTRACE_DETACH, pid, NULL, 0), 0); > + > + /* Now signal we are done installing so it can do a getpid */ > + EXPECT_EQ(write(sk_pair[0], "H", 1), 1); > + close(sk_pair[0]); > + > + /* Make a new socket pair so we can send half across */ > + EXPECT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); > + > + ret = read_notif(listener, &req); > + EXPECT_EQ(ret, sizeof(req)); > + EXPECT_EQ(errno, 0); > + > + resp.id = req.id; > + resp.return_fd = 1; > + resp.fd = sk_pair[1]; > + EXPECT_EQ(write(listener, &resp, sizeof(resp)), sizeof(resp)); > + close(sk_pair[1]); > + > + EXPECT_EQ(write(sk_pair[0], "hello world\0", 12), 12); > + close(sk_pair[0]); > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > + close(listener); > +} > + > +TEST(user_notification_struct_size_mismatch) > +{ > + pid_t pid; > + long ret; > + int status, listener, len; > + struct seccomp_notif req; > + struct seccomp_notif_resp resp; > + > + listener = user_trap_syscall(__NR_getpid, > + SECCOMP_FILTER_FLAG_GET_LISTENER); > + EXPECT_GE(listener, 0); > + > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + if (pid == 0) { > + ret = syscall(__NR_getpid); > + exit(ret != USER_NOTIF_MAGIC); > + } > + > + EXPECT_EQ(read(listener, &req, sizeof(req)), sizeof(req)); > + > + resp.id = req.id; > + resp.error = 0; > + resp.val = USER_NOTIF_MAGIC; > + > + /* > + * Only write a partial structure: this is what was available before we > + * had fd support. > + */ > + len = offsetof(struct seccomp_notif_resp, val) + sizeof(resp.val); > + EXPECT_EQ(write(listener, &resp, len), len); > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > +} > + > /* > * TODO: > * - add microbenchmarks > -- > 2.17.0 > _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers