On Wed, Sep 19, 2018 at 12:58:20PM -0700, Andy Lutomirski wrote: > On Wed, Sep 19, 2018 at 7:38 AM, Tycho Andersen <tycho@xxxxxxxx> wrote: > > On Wed, Sep 19, 2018 at 07:19:56AM -0700, Andy Lutomirski wrote: > >> > >> > >> > On Sep 19, 2018, at 2:55 AM, Tycho Andersen <tycho@xxxxxxxx> wrote: > >> > > >> >> On Wed, Sep 12, 2018 at 04:52:38PM -0700, Andy Lutomirski wrote: > >> >>> On Thu, Sep 6, 2018 at 8:28 AM, Tycho Andersen <tycho@xxxxxxxx> wrote: > >> >>> The idea here is that the userspace handler should be able to pass an fd > >> >>> back to the trapped task, for example so it can be returned from socket(). > >> >>> > >> >>> I've proposed one API here, but I'm open to other options. In particular, > >> >>> this only lets you return an fd from a syscall, which may not be enough in > >> >>> all cases. For example, if an fd is written to an output parameter instead > >> >>> of returned, the current API can't handle this. Another case is that > >> >>> netlink takes as input fds sometimes (IFLA_NET_NS_FD, e.g.). If netlink > >> >>> ever decides to install an fd and output it, we wouldn't be able to handle > >> >>> this either. > >> >> > >> >> An alternative could be to have an API (an ioctl on the listener, > >> >> perhaps) that just copies an fd into the tracee. There would be the > >> >> obvious set of options: do we replace an existing fd or allocate a new > >> >> one, and is it CLOEXEC. Then the tracer could add an fd and then > >> >> return it just like it's a regular number. > >> >> > >> >> I feel like this would be more flexible and conceptually simpler, but > >> >> maybe a little slower for the common cases. What do you think? > >> > > >> > I'm just implementing this now, and there's one question: when do we > >> > actually do the fd install? Should we do it when the user calls > >> > SECCOMP_NOTIF_PUT_FD, or when the actual response is sent? It feels > >> > like we should do it when the response is sent, instead of doing it > >> > right when SECCOMP_NOTIF_PUT_FD is called, since if there's a > >> > subsequent signal and the tracer decides to discard the response, > >> > we'll have to implement some delete mechanism to delete the fd, but it > >> > would have already been visible to the process, etc. So I'll go > >> > forward with this unless there are strong objections, but I thought > >> > I'd point it out just to avoid another round trip. > >> > > >> > > >> > >> Can you do that non-racily? That is, you need to commit to an fd *number* right away, but what if another thread uses the number before you actually install the fd? > > > > I was thinking we could just do an __alloc_fd() and then do the > > fd_install() when the response is sent or clean up the case that the > > listener or task dies. I haven't actually tried to run the code yet, > > so it's possible the locking won't work :) > > I would be very surprised if the locking works. How can you run a > thread in a process when another thread has allocated but not > installed an fd and is blocked for an arbitrarily long time? I think the trick is that there's no actual locking required (except for a brief locking of task->files). I've run the patch below and it seems to work. But perhaps that's abusing __alloc_fd a little too hard, I don't really know. > > > >> Do we really allow non-“kill” signals to interrupt the whole process? It might be the case that we don’t really need to clean up from signals if there’s a guarantee that the thread dies. > > > > Yes, we do, because of this: https://lkml.org/lkml/2018/3/15/1122 > > > > I'm still not sure I see the problem. Suppose I'm implementing a user > notifier for a nasty syscall like recvmsg(). If I'm the tracer, by > the time I decide to install an fd, I've committed to returning > something other than -EINTR, even if a non-fatal signal is sent before > I finish. No rollback should be necessary. I don't understand why this is true. Surely you could stop a handler on receipt of a new signal, and have it do something else entirely? > In the (unlikely?) event that some tracer needs to be able to rollback > an fd installation to return -EINTR, a SECCOMP_NOTIF_CLOSE_FD > operation should be good enough, I think. Or maybe PUT_FD can put -1 > to delete an fd. Yes, I think even with something like what I did below we'd need some sort of REMOVE_FD option, because otherwise there's no way to change your mind and send -EINTR without the fd you just PUT_FD'd. Tycho >From bfca7337cb53791aca74b595eb45e9afa3babac2 Mon Sep 17 00:00:00 2001 From: Tycho Andersen <tycho@xxxxxxxx> Date: Thu, 20 Sep 2018 06:49:49 -0600 Subject: [PATCH] implement SECCOMP_NOTIF_PUT_FD ioctl Signed-off-by: Tycho Andersen <tycho@xxxxxxxx> --- include/uapi/linux/seccomp.h | 8 ++ kernel/seccomp.c | 74 ++++++++++++------- tools/testing/selftests/seccomp/seccomp_bpf.c | 24 +++++- 3 files changed, 77 insertions(+), 29 deletions(-) diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 8fb2c024c0a1..62e474c372d4 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -80,6 +80,12 @@ struct seccomp_notif_resp { __u32 fd_flags; }; +struct seccomp_notif_put_fd { + __u64 id; + __s32 fd; + __u32 fd_flags; +}; + #define SECCOMP_IOC_MAGIC 0xF7 /* Flags for seccomp notification fd ioctl. */ @@ -89,5 +95,7 @@ struct seccomp_notif_resp { struct seccomp_notif_resp) #define SECCOMP_NOTIF_ID_VALID _IOR(SECCOMP_IOC_MAGIC, 2, \ __u64) +#define SECCOMP_NOTIF_PUT_FD _IOR(SECCOMP_IOC_MAGIC, 3, \ + struct seccomp_notif_put_fd) #endif /* _UAPI_LINUX_SECCOMP_H */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 21b24cc07237..6bdf413863ca 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -41,6 +41,7 @@ #include <linux/tracehook.h> #include <linux/uaccess.h> #include <linux/anon_inodes.h> +#include <linux/fdtable.h> #include <net/cls_cgroup.h> enum notify_state { @@ -51,6 +52,7 @@ enum notify_state { struct seccomp_knotif { /* The struct pid of the task whose filter triggered the notification */ + struct task_struct *task; struct pid *pid; /* The "cookie" for this request; this is unique for this filter. */ @@ -80,7 +82,7 @@ struct seccomp_knotif { int error; long val; struct file *file; - unsigned int flags; + int fd; /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */ struct completion ready; @@ -748,9 +750,11 @@ static void seccomp_do_user_notification(int this_syscall, if (!match->notif) goto out; + n.task = current; n.pid = task_pid(current); n.state = SECCOMP_NOTIFY_INIT; n.data = sd; + n.fd = -1; n.id = seccomp_next_notify_id(match); init_completion(&n.ready); @@ -786,16 +790,8 @@ static void seccomp_do_user_notification(int this_syscall, } if (n.file) { - int fd; struct socket *sock; - fd = get_unused_fd_flags(n.flags); - if (fd < 0) { - err = fd; - ret = -1; - goto remove_list; - } - /* * Similar to what SCM_RIGHTS does, let's re-set the cgroup * data to point ot the tracee's cgroups instead of the @@ -807,21 +803,20 @@ static void seccomp_do_user_notification(int this_syscall, sock_update_classid(&sock->sk->sk_cgrp_data); } - ret = fd; - err = 0; - - fd_install(fd, n.file); + fd_install(n.fd, n.file); /* Don't fput, since fd has a reference now */ n.file = NULL; - } else { - ret = n.val; - err = n.error; + n.fd = -1; } + ret = n.val; + err = n.error; remove_list: if (n.file) fput(n.file); + if (n.fd >= 0) + put_unused_fd(n.fd); list_del(&n.list); out: @@ -1683,15 +1678,6 @@ static long seccomp_notify_send(struct seccomp_filter *filter, goto out; } - if (resp.return_fd) { - knotif->flags = resp.fd_flags; - knotif->file = fget(resp.fd); - if (!knotif->file) { - ret = -EBADF; - goto out; - } - } - ret = size; knotif->state = SECCOMP_NOTIFY_REPLIED; knotif->error = resp.error; @@ -1731,6 +1717,42 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter, return ret; } +static long seccomp_notify_put_fd(struct seccomp_filter *filter, + unsigned long arg) +{ + struct seccomp_notif_put_fd req; + void __user *buf = (void __user *)arg; + struct seccomp_knotif *knotif = NULL; + long ret; + + if (copy_from_user(&req, buf, sizeof(req))) + return -EFAULT; + + ret = mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + return ret; + + ret = -ENOENT; + list_for_each_entry(knotif, &filter->notif->notifications, list) { + unsigned long max_files; + + if (knotif->id != req.id) + continue; + + max_files = task_rlimit(knotif->task, RLIMIT_NOFILE); + + knotif->file = fget(req.fd); + knotif->fd = __alloc_fd(knotif->task->files, 0, max_files, + req.fd_flags); + + ret = knotif->fd; + break; + } + + mutex_unlock(&filter->notify_lock); + return ret; +} + static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -1743,6 +1765,8 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, return seccomp_notify_send(filter, arg); case SECCOMP_NOTIF_ID_VALID: return seccomp_notify_id_valid(filter, arg); + case SECCOMP_NOTIF_PUT_FD: + return seccomp_notify_put_fd(filter, arg); default: return -EINVAL; } diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 3dec856717a7..ae8daf992231 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -169,6 +169,9 @@ struct seccomp_metadata { struct seccomp_notif_resp) #define SECCOMP_NOTIF_ID_VALID _IOR(SECCOMP_IOC_MAGIC, 2, \ __u64) +#define SECCOMP_NOTIF_PUT_FD _IOR(SECCOMP_IOC_MAGIC, 3, \ + struct seccomp_notif_put_fd) + struct seccomp_notif { __u16 len; __u64 id; @@ -186,6 +189,12 @@ struct seccomp_notif_resp { __u32 fd; __u32 fd_flags; }; + +struct seccomp_notif_put_fd { + __u64 id; + __s32 fd; + __u32 fd_flags; +}; #endif #ifndef seccomp @@ -3239,11 +3248,12 @@ TEST(get_user_notification_ptrace) TEST(user_notification_pass_fd) { pid_t pid; - int status, listener; + int status, listener, fd; int sk_pair[2]; char c; struct seccomp_notif req = {}; struct seccomp_notif_resp resp = {}; + struct seccomp_notif_put_fd putfd = {}; long ret; ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); @@ -3295,9 +3305,15 @@ TEST(user_notification_pass_fd) resp.len = sizeof(resp); resp.id = req.id; - resp.return_fd = 1; - resp.fd = sk_pair[1]; - resp.fd_flags = 0; + + putfd.id = req.id; + putfd.fd = sk_pair[1]; + putfd.fd_flags = 0; + + fd = ioctl(listener, SECCOMP_NOTIF_PUT_FD, &putfd); + EXPECT_GE(fd, 0); + resp.val = fd; + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp)); close(sk_pair[1]); -- 2.17.1