On Sun, Dec 9, 2018 at 10:24 AM Tycho Andersen <tycho@xxxxxxxx> wrote: > > This patch introduces a means for syscalls matched in seccomp to notify > some other task that a particular filter has been triggered. > > The motivation for this is primarily for use with containers. For example, > if a container does an init_module(), we obviously don't want to load this > untrusted code, which may be compiled for the wrong version of the kernel > anyway. Instead, we could parse the module image, figure out which module > the container is trying to load and load it on the host. > > As another example, containers cannot mount() in general since various > filesystems assume a trusted image. However, if an orchestrator knows that > e.g. a particular block device has not been exposed to a container for > writing, it want to allow the container to mount that block device (that > is, handle the mount for it). > > This patch adds functionality that is already possible via at least two > other means that I know about, both of which involve ptrace(): first, one > could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL. > Unfortunately this is slow, so a faster version would be to install a > filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP. > Since ptrace allows only one tracer, if the container runtime is that > tracer, users inside the container (or outside) trying to debug it will not > be able to use ptrace, which is annoying. It also means that older > distributions based on Upstart cannot boot inside containers using ptrace, > since upstart itself uses ptrace to monitor services while starting. > > The actual implementation of this is fairly small, although getting the > synchronization right was/is slightly complex. > > Finally, it's worth noting that the classic seccomp TOCTOU of reading > memory data from the task still applies here, but can be avoided with > careful design of the userspace handler: if the userspace handler reads all > of the task memory that is necessary before applying its security policy, > the tracee's subsequent memory edits will not be read by the tracer. > > Signed-off-by: Tycho Andersen <tycho@xxxxxxxx> > CC: Kees Cook <keescook@xxxxxxxxxxxx> > CC: Andy Lutomirski <luto@xxxxxxxxxxxxxx> > CC: Oleg Nesterov <oleg@xxxxxxxxxx> > CC: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> > CC: "Serge E. Hallyn" <serge@xxxxxxxxxx> > Acked-by: Serge Hallyn <serge@xxxxxxxxxx> > CC: Christian Brauner <christian@xxxxxxxxxx> > CC: Tyler Hicks <tyhicks@xxxxxxxxxxxxx> > CC: Akihiro Suda <suda.akihiro@xxxxxxxxxxxxx> This takes care of everything I mentioned (and has incorporated LOTS of people's suggestions), so I think it's ready for -next. I've applied this and am doing local testing now. Thanks for keeping with this! -Kees > --- > v2: * make id a u64; the idea here being that it will never overflow, > because 64 is huge (one syscall every nanosecond => wrap every 584 > years) (Andy) > * prevent nesting of user notifications: if someone is already attached > the tree in one place, nobody else can attach to the tree (Andy) > * notify the listener of signals the tracee receives as well (Andy) > * implement poll > v3: * lockdep fix (Oleg) > * drop unnecessary WARN()s (Christian) > * rearrange error returns to be more rpetty (Christian) > * fix build in !CONFIG_SECCOMP_USER_NOTIFICATION case > v4: * fix implementation of poll to use poll_wait() (Jann) > * change listener's fd flags to be 0 (Jann) > * hoist filter initialization out of ifdefs to its own function > init_user_notification() > * add some more testing around poll() and closing the listener while a > syscall is in action > * s/GET_LISTENER/NEW_LISTENER, since you can't _get_ a listener, but it > creates a new one (Matthew) > * correctly handle pid namespaces, add some testcases (Matthew) > * use EINPROGRESS instead of EINVAL when a notification response is > written twice (Matthew) > * fix comment typo from older version (SEND vs READ) (Matthew) > * whitespace and logic simplification (Tobin) > * add some Documentation/ bits on userspace trapping > v5: * fix documentation typos (Jann) > * add signalled field to struct seccomp_notif (Jann) > * switch to using ioctls instead of read()/write() for struct passing > (Jann) > * add an ioctl to ensure an id is still valid > v6: * docs typo fixes, update docs for ioctl() change (Christian) > v7: * switch struct seccomp_knotif's id member to a u64 (derp :) > * use notify_lock in IS_ID_VALID query to avoid racing > * s/signalled/signaled (Tyler) > * fix docs to reflect that ids are not globally unique (Tyler) > * add a test to check -ERESTARTSYS behavior (Tyler) > * drop CONFIG_SECCOMP_USER_NOTIFICATION (Tyler) > * reorder USER_NOTIF in seccomp return codes list (Tyler) > * return size instead of sizeof(struct user_notif) (Tyler) > * ENOENT instead of EINVAL when invalid id is passed (Tyler) > * drop CONFIG_SECCOMP_USER_NOTIFICATION guards (Tyler) > * s/IS_ID_VALID/ID_VALID and switch ioctl to be "well behaved" (Tyler) > * add a new struct notification to minimize the additions to > struct seccomp_filter, also pack the necessary additions a bit more > cleverly (Tyler) > * switch to keeping track of the task itself instead of the pid (we'll > use this for implementing PUT_FD) > v8: * in recv, don't copy_to_user() while holding notify lock, in case > userfaultfd blocks and causes all syscalls to block (Kees) > * switch ioctl character to something more fun ! (Kees) > * switch ioctl defines to use their own SECCOMP_IO* macros (Kees) > * rename seccomp ioctls to be SECCOMP_IOCTL_* (Kees) > * move comment of notify_lock to the right place (Jann) > * drop comment abount reference count bounding in __get_seccomp_filter (Jann) > * add lockdep_assert_held() in seccomp_next_notify_id() (Kees) > * in seccomp_do_user_notification(), always increment semaphore before > releasing lock, to prevent use after free of ->notif (Kees) > * add another wake_up_poll() when a signal is received (Jann) > * make all listener fds O_CLOEXEC (Jann/Kees) > * use memset() instead of = {} initialization for structures (Kees) > * move casting of buf pointer to ioctl, instead of in handler functions (Kees) > * fix ENOENT testing in seccomp_notify_send() (Jann) > * use ENOENT instead of -1 (EPERM) for ID_VALID ioctl (Jann) > * use ()s around "nested" bit operations (Kees) > * init struct notification members in the order they're declared (Jann) > * rearrange things so no forward declaration of init_listener() is > required (Kees) > * switch to a flags based future-proofing mechanism for struct > seccomp_notif and seccomp_notif_resp, thus avoiding version issues > with structure length (Kees) > * fix a memory leak in init_listener() in a failure case > * fix a use-after-free of filter->notif in do_user_notification() when > the listener fd is closed after a signal is sent > * add a comment about semaphore state in the interrupt case in > do_user_notification() + seccomp_notify_recv() > v9: * add SECCOMP_GET_NOTIF_SIZES to handle when struct seccomp_data > changes in size > * don't do locking all the way up the seccomp tree (Oleg) > * rearrange the tests so that one test tests one thing > * avoid an unkillable sleep by dropping the signaled flag (Oleg) > --- > Documentation/ioctl/ioctl-number.txt | 1 + > .../userspace-api/seccomp_filter.rst | 84 ++++ > include/linux/seccomp.h | 7 +- > include/uapi/linux/seccomp.h | 40 +- > kernel/seccomp.c | 448 +++++++++++++++++- > tools/testing/selftests/seccomp/seccomp_bpf.c | 447 ++++++++++++++++- > 6 files changed, 1017 insertions(+), 10 deletions(-) > > diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt > index af6f6ba1fe80..c9558146ac58 100644 > --- a/Documentation/ioctl/ioctl-number.txt > +++ b/Documentation/ioctl/ioctl-number.txt > @@ -79,6 +79,7 @@ Code Seq#(hex) Include File Comments > 0x1b all InfiniBand Subsystem <http://infiniband.sourceforge.net/> > 0x20 all drivers/cdrom/cm206.h > 0x22 all scsi/sg.h > +'!' 00-1F uapi/linux/seccomp.h > '#' 00-3F IEEE 1394 Subsystem Block for the entire subsystem > '$' 00-0F linux/perf_counter.h, linux/perf_event.h > '%' 00-0F include/uapi/linux/stm.h > diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst > index 82a468bc7560..b1b846d8a094 100644 > --- a/Documentation/userspace-api/seccomp_filter.rst > +++ b/Documentation/userspace-api/seccomp_filter.rst > @@ -122,6 +122,11 @@ In precedence order, they are: > Results in the lower 16-bits of the return value being passed > to userland as the errno without executing the system call. > > +``SECCOMP_RET_USER_NOTIF``: > + Results in a ``struct seccomp_notif`` message sent on the userspace > + notification fd, if it is attached, or ``-ENOSYS`` if it is not. See below > + on discussion of how to handle user notifications. > + > ``SECCOMP_RET_TRACE``: > When returned, this value will cause the kernel to attempt to > notify a ``ptrace()``-based tracer prior to executing the system > @@ -183,6 +188,85 @@ The ``samples/seccomp/`` directory contains both an x86-specific example > and a more generic example of a higher level macro interface for BPF > program generation. > > +Userspace Notification > +====================== > + > +The ``SECCOMP_RET_USER_NOTIF`` return code lets seccomp filters pass a > +particular syscall to userspace to be handled. This may be useful for > +applications like container managers, which wish to intercept particular > +syscalls (``mount()``, ``finit_module()``, etc.) and change their behavior. > + > +To acquire a notification FD, use the ``SECCOMP_FILTER_FLAG_NEW_LISTENER`` > +argument to the ``seccomp()`` syscall: > + > +.. code-block:: c > + > + fd = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog); > + > +which (on success) will return a listener fd for the filter, which can then be > +passed around via ``SCM_RIGHTS`` or similar. Note that filter fds correspond to > +a particular filter, and not a particular task. So if this task then forks, > +notifications from both tasks will appear on the same filter fd. Reads and > +writes to/from a filter fd are also synchronized, so a filter fd can safely > +have many readers. > + > +The interface for a seccomp notification fd consists of two structures: > + > +.. code-block:: c > + > + struct seccomp_notif_sizes { > + __u16 seccomp_notif; > + __u16 seccomp_notif_resp; > + __u16 seccomp_data; > + }; > + > + struct seccomp_notif { > + __u64 id; > + __u32 pid; > + __u32 flags; > + struct seccomp_data data; > + }; > + > + struct seccomp_notif_resp { > + __u64 id; > + __s64 val; > + __s32 error; > + __u32 flags; > + }; > + > +The ``struct seccomp_notif_sizes`` structure can be used to determine the size > +of the various structures used in seccomp notifications. The size of ``struct > +seccomp_data`` may change in the future, so code should use: > + > +.. code-block:: c > + > + struct seccomp_notif_sizes sizes; > + seccomp(SECCOMP_GET_NOTIF_SIZES, 0, &sizes); > + > +to determine the size of the various structures to allocate. See > +samples/seccomp/user-trap.c for an example. > + > +Users can read via ``ioctl(SECCOMP_IOCTL_NOTIF_RECV)`` (or ``poll()``) on a > +seccomp notification fd to receive a ``struct seccomp_notif``, which contains > +five members: the input length of the structure, a unique-per-filter ``id``, > +the ``pid`` of the task which triggered this request (which may be 0 if the > +task is in a pid ns not visible from the listener's pid namespace), a ``flags`` > +member which for now only has ``SECCOMP_NOTIF_FLAG_SIGNALED``, representing > +whether or not the notification is a result of a non-fatal signal, and the > +``data`` passed to seccomp. Userspace can then make a decision based on this > +information about what to do, and ``ioctl(SECCOMP_IOCTL_NOTIF_SEND)`` a > +response, indicating what should be returned to userspace. The ``id`` member of > +``struct seccomp_notif_resp`` should be the same ``id`` as in ``struct > +seccomp_notif``. > + > +It is worth noting that ``struct seccomp_data`` contains the values of register > +arguments to the syscall, but does not contain pointers to memory. The task's > +memory is accessible to suitably privileged traces via ``ptrace()`` or > +``/proc/pid/mem``. However, care should be taken to avoid the TOCTOU mentioned > +above in this document: all arguments being read from the tracee's memory > +should be read into the tracer's memory before any policy decisions are made. > +This allows for an atomic decision on syscall arguments. > + > Sysctls > ======= > > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > index b5103c019cf4..84868d37b35d 100644 > --- a/include/linux/seccomp.h > +++ b/include/linux/seccomp.h > @@ -4,9 +4,10 @@ > > #include <uapi/linux/seccomp.h> > > -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ > - SECCOMP_FILTER_FLAG_LOG | \ > - SECCOMP_FILTER_FLAG_SPEC_ALLOW) > +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ > + SECCOMP_FILTER_FLAG_LOG | \ > + SECCOMP_FILTER_FLAG_SPEC_ALLOW | \ > + SECCOMP_FILTER_FLAG_NEW_LISTENER) > > #ifdef CONFIG_SECCOMP > > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h > index 9efc0e73d50b..90734aa5aa36 100644 > --- a/include/uapi/linux/seccomp.h > +++ b/include/uapi/linux/seccomp.h > @@ -15,11 +15,13 @@ > #define SECCOMP_SET_MODE_STRICT 0 > #define SECCOMP_SET_MODE_FILTER 1 > #define SECCOMP_GET_ACTION_AVAIL 2 > +#define SECCOMP_GET_NOTIF_SIZES 3 > > /* Valid flags for SECCOMP_SET_MODE_FILTER */ > -#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) > -#define SECCOMP_FILTER_FLAG_LOG (1UL << 1) > -#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) > +#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) > +#define SECCOMP_FILTER_FLAG_LOG (1UL << 1) > +#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) > +#define SECCOMP_FILTER_FLAG_NEW_LISTENER (1UL << 3) > > /* > * All BPF programs must return a 32-bit value. > @@ -35,6 +37,7 @@ > #define SECCOMP_RET_KILL SECCOMP_RET_KILL_THREAD > #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */ > #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ > +#define SECCOMP_RET_USER_NOTIF 0x7fc00000U /* notifies userspace */ > #define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */ > #define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */ > #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ > @@ -60,4 +63,35 @@ struct seccomp_data { > __u64 args[6]; > }; > > +struct seccomp_notif_sizes { > + __u16 seccomp_notif; > + __u16 seccomp_notif_resp; > + __u16 seccomp_data; > +}; > + > +struct seccomp_notif { > + __u64 id; > + __u32 pid; > + __u32 flags; > + struct seccomp_data data; > +}; > + > +struct seccomp_notif_resp { > + __u64 id; > + __s64 val; > + __s32 error; > + __u32 flags; > +}; > + > +#define SECCOMP_IOC_MAGIC '!' > +#define SECCOMP_IO(nr) _IO(SECCOMP_IOC_MAGIC, nr) > +#define SECCOMP_IOR(nr, type) _IOR(SECCOMP_IOC_MAGIC, nr, type) > +#define SECCOMP_IOW(nr, type) _IOW(SECCOMP_IOC_MAGIC, nr, type) > +#define SECCOMP_IOWR(nr, type) _IOWR(SECCOMP_IOC_MAGIC, nr, type) > + > +/* Flags for seccomp notification fd ioctl. */ > +#define SECCOMP_IOCTL_NOTIF_RECV SECCOMP_IOWR(0, struct seccomp_notif) > +#define SECCOMP_IOCTL_NOTIF_SEND SECCOMP_IOWR(1, \ > + struct seccomp_notif_resp) > +#define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64) > #endif /* _UAPI_LINUX_SECCOMP_H */ > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index 393e029f778a..15b6be97fc09 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -33,12 +33,74 @@ > #endif > > #ifdef CONFIG_SECCOMP_FILTER > +#include <linux/file.h> > #include <linux/filter.h> > #include <linux/pid.h> > #include <linux/ptrace.h> > #include <linux/security.h> > #include <linux/tracehook.h> > #include <linux/uaccess.h> > +#include <linux/anon_inodes.h> > + > +enum notify_state { > + SECCOMP_NOTIFY_INIT, > + SECCOMP_NOTIFY_SENT, > + SECCOMP_NOTIFY_REPLIED, > +}; > + > +struct seccomp_knotif { > + /* The struct pid of the task whose filter triggered the notification */ > + struct task_struct *task; > + > + /* The "cookie" for this request; this is unique for this filter. */ > + u64 id; > + > + /* > + * The seccomp data. This pointer is valid the entire time this > + * notification is active, since it comes from __seccomp_filter which > + * eclipses the entire lifecycle here. > + */ > + const struct seccomp_data *data; > + > + /* > + * Notification states. When SECCOMP_RET_USER_NOTIF is returned, a > + * struct seccomp_knotif is created and starts out in INIT. Once the > + * handler reads the notification off of an FD, it transitions to SENT. > + * If a signal is received the state transitions back to INIT and > + * another message is sent. When the userspace handler replies, state > + * transitions to REPLIED. > + */ > + enum notify_state state; > + > + /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */ > + int error; > + long val; > + > + /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */ > + struct completion ready; > + > + struct list_head list; > +}; > + > +/** > + * struct notification - container for seccomp userspace notifications. Since > + * most seccomp filters will not have notification listeners attached and this > + * structure is fairly large, we store the notification-specific stuff in a > + * separate structure. > + * > + * @request: A semaphore that users of this notification can wait on for > + * changes. Actual reads and writes are still controlled with > + * filter->notify_lock. > + * @next_id: The id of the next request. > + * @notifications: A list of struct seccomp_knotif elements. > + * @wqh: A wait queue for poll. > + */ > +struct notification { > + struct semaphore request; > + u64 next_id; > + struct list_head notifications; > + wait_queue_head_t wqh; > +}; > > /** > * struct seccomp_filter - container for seccomp BPF programs > @@ -50,6 +112,8 @@ > * @log: true if all actions except for SECCOMP_RET_ALLOW should be logged > * @prev: points to a previously installed, or inherited, filter > * @prog: the BPF program to evaluate > + * @notif: the struct that holds all notification related information > + * @notify_lock: A lock for all notification-related accesses. > * > * seccomp_filter objects are organized in a tree linked via the @prev > * pointer. For any task, it appears to be a singly-linked list starting > @@ -66,6 +130,8 @@ struct seccomp_filter { > bool log; > struct seccomp_filter *prev; > struct bpf_prog *prog; > + struct notification *notif; > + struct mutex notify_lock; > }; > > /* Limit any path through the tree to 256KB worth of instructions. */ > @@ -386,6 +452,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog) > if (!sfilter) > return ERR_PTR(-ENOMEM); > > + mutex_init(&sfilter->notify_lock); > ret = bpf_prog_create_from_user(&sfilter->prog, fprog, > seccomp_check_filter, save_orig); > if (ret < 0) { > @@ -479,7 +546,6 @@ static long seccomp_attach_filter(unsigned int flags, > > static void __get_seccomp_filter(struct seccomp_filter *filter) > { > - /* Reference count is bounded by the number of total processes. */ > refcount_inc(&filter->usage); > } > > @@ -550,11 +616,13 @@ static void seccomp_send_sigsys(int syscall, int reason) > #define SECCOMP_LOG_TRACE (1 << 4) > #define SECCOMP_LOG_LOG (1 << 5) > #define SECCOMP_LOG_ALLOW (1 << 6) > +#define SECCOMP_LOG_USER_NOTIF (1 << 7) > > static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS | > SECCOMP_LOG_KILL_THREAD | > SECCOMP_LOG_TRAP | > SECCOMP_LOG_ERRNO | > + SECCOMP_LOG_USER_NOTIF | > SECCOMP_LOG_TRACE | > SECCOMP_LOG_LOG; > > @@ -575,6 +643,9 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action, > case SECCOMP_RET_TRACE: > log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE; > break; > + case SECCOMP_RET_USER_NOTIF: > + log = requested && seccomp_actions_logged & SECCOMP_LOG_USER_NOTIF; > + break; > case SECCOMP_RET_LOG: > log = seccomp_actions_logged & SECCOMP_LOG_LOG; > break; > @@ -646,6 +717,68 @@ void secure_computing_strict(int this_syscall) > #else > > #ifdef CONFIG_SECCOMP_FILTER > +static u64 seccomp_next_notify_id(struct seccomp_filter *filter) > +{ > + /* > + * Note: overflow is ok here, the id just needs to be unique per > + * filter. > + */ > + lockdep_assert_held(&filter->notify_lock); > + return filter->notif->next_id++; > +} > + > +static void seccomp_do_user_notification(int this_syscall, > + struct seccomp_filter *match, > + const struct seccomp_data *sd) > +{ > + int err; > + long ret = 0; > + struct seccomp_knotif n = {}; > + > + mutex_lock(&match->notify_lock); > + err = -ENOSYS; > + if (!match->notif) > + goto out; > + > + n.task = current; > + n.state = SECCOMP_NOTIFY_INIT; > + n.data = sd; > + n.id = seccomp_next_notify_id(match); > + init_completion(&n.ready); > + list_add(&n.list, &match->notif->notifications); > + > + up(&match->notif->request); > + wake_up_poll(&match->notif->wqh, EPOLLIN | EPOLLRDNORM); > + mutex_unlock(&match->notify_lock); > + > + /* > + * This is where we wait for a reply from userspace. > + */ > + err = wait_for_completion_interruptible(&n.ready); > + mutex_lock(&match->notify_lock); > + if (err == 0) { > + ret = n.val; > + err = n.error; > + } > + > + /* > + * Note that it's possible the listener died in between the time when > + * we were notified of a respons (or a signal) and when we were able to > + * re-acquire the lock, so only delete from the list if the > + * notification actually exists. > + * > + * Also note that this test is only valid because there's no way to > + * *reattach* to a notifier right now. If one is added, we'll need to > + * keep track of the notif itself and make sure they match here. > + */ > + if (match->notif) > + list_del(&n.list); > +out: > + mutex_unlock(&match->notify_lock); > + syscall_set_return_value(current, task_pt_regs(current), > + err, ret); > +} > + > static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, > const bool recheck_after_trace) > { > @@ -728,6 +861,10 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, > > return 0; > > + case SECCOMP_RET_USER_NOTIF: > + seccomp_do_user_notification(this_syscall, match, sd); > + goto skip; > + > case SECCOMP_RET_LOG: > seccomp_log(this_syscall, 0, action, true); > return 0; > @@ -834,6 +971,263 @@ static long seccomp_set_mode_strict(void) > } > > #ifdef CONFIG_SECCOMP_FILTER > +static int seccomp_notify_release(struct inode *inode, struct file *file) > +{ > + struct seccomp_filter *filter = file->private_data; > + struct seccomp_knotif *knotif; > + > + mutex_lock(&filter->notify_lock); > + > + /* > + * If this file is being closed because e.g. the task who owned it > + * died, let's wake everyone up who was waiting on us. > + */ > + list_for_each_entry(knotif, &filter->notif->notifications, list) { > + if (knotif->state == SECCOMP_NOTIFY_REPLIED) > + continue; > + > + knotif->state = SECCOMP_NOTIFY_REPLIED; > + knotif->error = -ENOSYS; > + knotif->val = 0; > + > + complete(&knotif->ready); > + } > + > + kfree(filter->notif); > + filter->notif = NULL; > + mutex_unlock(&filter->notify_lock); > + __put_seccomp_filter(filter); > + return 0; > +} > + > +static long seccomp_notify_recv(struct seccomp_filter *filter, > + void __user *buf) > +{ > + struct seccomp_knotif *knotif = NULL, *cur; > + struct seccomp_notif unotif; > + ssize_t ret; > + > + memset(&unotif, 0, sizeof(unotif)); > + > + ret = down_interruptible(&filter->notif->request); > + if (ret < 0) > + return ret; > + > + mutex_lock(&filter->notify_lock); > + list_for_each_entry(cur, &filter->notif->notifications, list) { > + if (cur->state == SECCOMP_NOTIFY_INIT) { > + knotif = cur; > + break; > + } > + } > + > + /* > + * If we didn't find a notification, it could be that the task was > + * interrupted by a fatal signal between the time we were woken and > + * when we were able to acquire the rw lock. > + */ > + if (!knotif) { > + ret = -ENOENT; > + goto out; > + } > + > + unotif.id = knotif->id; > + unotif.pid = task_pid_vnr(knotif->task); > + unotif.data = *(knotif->data); > + > + knotif->state = SECCOMP_NOTIFY_SENT; > + wake_up_poll(&filter->notif->wqh, EPOLLOUT | EPOLLWRNORM); > + ret = 0; > +out: > + mutex_unlock(&filter->notify_lock); > + > + if (ret == 0 && copy_to_user(buf, &unotif, sizeof(unotif))) { > + ret = -EFAULT; > + > + /* > + * Userspace screwed up. To make sure that we keep this > + * notification alive, let's reset it back to INIT. It > + * may have died when we released the lock, so we need to make > + * sure it's still around. > + */ > + knotif = NULL; > + mutex_lock(&filter->notify_lock); > + list_for_each_entry(cur, &filter->notif->notifications, list) { > + if (cur->id == unotif.id) { > + knotif = cur; > + break; > + } > + } > + > + if (knotif) { > + knotif->state = SECCOMP_NOTIFY_INIT; > + up(&filter->notif->request); > + } > + mutex_unlock(&filter->notify_lock); > + } > + > + return ret; > +} > + > +static long seccomp_notify_send(struct seccomp_filter *filter, > + void __user *buf) > +{ > + struct seccomp_notif_resp resp = {}; > + struct seccomp_knotif *knotif = NULL, *cur; > + long ret; > + > + if (copy_from_user(&resp, buf, sizeof(resp))) > + return -EFAULT; > + > + if (resp.flags) > + return -EINVAL; > + > + ret = mutex_lock_interruptible(&filter->notify_lock); > + if (ret < 0) > + return ret; > + > + list_for_each_entry(cur, &filter->notif->notifications, list) { > + if (cur->id == resp.id) { > + knotif = cur; > + break; > + } > + } > + > + if (!knotif) { > + ret = -ENOENT; > + goto out; > + } > + > + /* Allow exactly one reply. */ > + if (knotif->state != SECCOMP_NOTIFY_SENT) { > + ret = -EINPROGRESS; > + goto out; > + } > + > + ret = 0; > + knotif->state = SECCOMP_NOTIFY_REPLIED; > + knotif->error = resp.error; > + knotif->val = resp.val; > + complete(&knotif->ready); > +out: > + mutex_unlock(&filter->notify_lock); > + return ret; > +} > + > +static long seccomp_notify_id_valid(struct seccomp_filter *filter, > + void __user *buf) > +{ > + struct seccomp_knotif *knotif = NULL; > + u64 id; > + long ret; > + > + if (copy_from_user(&id, buf, sizeof(id))) > + return -EFAULT; > + > + ret = mutex_lock_interruptible(&filter->notify_lock); > + if (ret < 0) > + return ret; > + > + ret = -ENOENT; > + list_for_each_entry(knotif, &filter->notif->notifications, list) { > + if (knotif->id == id) { > + if (knotif->state == SECCOMP_NOTIFY_SENT) > + ret = 0; > + goto out; > + } > + } > + > +out: > + mutex_unlock(&filter->notify_lock); > + return ret; > +} > + > +static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, > + unsigned long arg) > +{ > + struct seccomp_filter *filter = file->private_data; > + void __user *buf = (void __user *)arg; > + > + switch (cmd) { > + case SECCOMP_IOCTL_NOTIF_RECV: > + return seccomp_notify_recv(filter, buf); > + case SECCOMP_IOCTL_NOTIF_SEND: > + return seccomp_notify_send(filter, buf); > + case SECCOMP_IOCTL_NOTIF_ID_VALID: > + return seccomp_notify_id_valid(filter, buf); > + default: > + return -EINVAL; > + } > +} > + > +static __poll_t seccomp_notify_poll(struct file *file, > + struct poll_table_struct *poll_tab) > +{ > + struct seccomp_filter *filter = file->private_data; > + __poll_t ret = 0; > + struct seccomp_knotif *cur; > + > + poll_wait(file, &filter->notif->wqh, poll_tab); > + > + ret = mutex_lock_interruptible(&filter->notify_lock); > + if (ret < 0) > + return EPOLLERR; > + > + list_for_each_entry(cur, &filter->notif->notifications, list) { > + if (cur->state == SECCOMP_NOTIFY_INIT) > + ret |= EPOLLIN | EPOLLRDNORM; > + if (cur->state == SECCOMP_NOTIFY_SENT) > + ret |= EPOLLOUT | EPOLLWRNORM; > + if ((ret & EPOLLIN) && (ret & EPOLLOUT)) > + break; > + } > + > + mutex_unlock(&filter->notify_lock); > + > + return ret; > +} > + > +static const struct file_operations seccomp_notify_ops = { > + .poll = seccomp_notify_poll, > + .release = seccomp_notify_release, > + .unlocked_ioctl = seccomp_notify_ioctl, > +}; > + > +static struct file *init_listener(struct seccomp_filter *filter) > +{ > + struct file *ret = ERR_PTR(-EBUSY); > + struct seccomp_filter *cur; > + > + for (cur = current->seccomp.filter; cur; cur = cur->prev) { > + if (cur->notif) > + goto out; > + } > + > + ret = ERR_PTR(-ENOMEM); > + filter->notif = kzalloc(sizeof(*(filter->notif)), GFP_KERNEL); > + if (!filter->notif) > + goto out; > + > + sema_init(&filter->notif->request, 0); > + filter->notif->next_id = get_random_u64(); > + INIT_LIST_HEAD(&filter->notif->notifications); > + init_waitqueue_head(&filter->notif->wqh); > + > + ret = anon_inode_getfile("seccomp notify", &seccomp_notify_ops, > + filter, O_RDWR); > + if (IS_ERR(ret)) > + goto out_notif; > + > + /* The file has a reference to it now */ > + __get_seccomp_filter(filter); > + > +out_notif: > + if (IS_ERR(ret)) > + kfree(filter->notif); > +out: > + return ret; > +} > + > /** > * seccomp_set_mode_filter: internal function for setting seccomp filter > * @flags: flags to change filter behavior > @@ -853,6 +1247,8 @@ static long seccomp_set_mode_filter(unsigned int flags, > const unsigned long seccomp_mode = SECCOMP_MODE_FILTER; > struct seccomp_filter *prepared = NULL; > long ret = -EINVAL; > + int listener = -1; > + struct file *listener_f = NULL; > > /* Validate flags. */ > if (flags & ~SECCOMP_FILTER_FLAG_MASK) > @@ -863,13 +1259,28 @@ static long seccomp_set_mode_filter(unsigned int flags, > if (IS_ERR(prepared)) > return PTR_ERR(prepared); > > + if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) { > + listener = get_unused_fd_flags(O_CLOEXEC); > + if (listener < 0) { > + ret = listener; > + goto out_free; > + } > + > + listener_f = init_listener(prepared); > + if (IS_ERR(listener_f)) { > + put_unused_fd(listener); > + ret = PTR_ERR(listener_f); > + goto out_free; > + } > + } > + > /* > * Make sure we cannot change seccomp or nnp state via TSYNC > * while another thread is in the middle of calling exec. > */ > if (flags & SECCOMP_FILTER_FLAG_TSYNC && > mutex_lock_killable(¤t->signal->cred_guard_mutex)) > - goto out_free; > + goto out_put_fd; > > spin_lock_irq(¤t->sighand->siglock); > > @@ -887,6 +1298,16 @@ static long seccomp_set_mode_filter(unsigned int flags, > spin_unlock_irq(¤t->sighand->siglock); > if (flags & SECCOMP_FILTER_FLAG_TSYNC) > mutex_unlock(¤t->signal->cred_guard_mutex); > +out_put_fd: > + if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) { > + if (ret < 0) { > + fput(listener_f); > + put_unused_fd(listener); > + } else { > + fd_install(listener, listener_f); > + ret = listener; > + } > + } > out_free: > seccomp_filter_free(prepared); > return ret; > @@ -911,6 +1332,7 @@ static long seccomp_get_action_avail(const char __user *uaction) > case SECCOMP_RET_KILL_THREAD: > case SECCOMP_RET_TRAP: > case SECCOMP_RET_ERRNO: > + case SECCOMP_RET_USER_NOTIF: > case SECCOMP_RET_TRACE: > case SECCOMP_RET_LOG: > case SECCOMP_RET_ALLOW: > @@ -922,6 +1344,20 @@ static long seccomp_get_action_avail(const char __user *uaction) > return 0; > } > > +static long seccomp_get_notif_sizes(void __user *usizes) > +{ > + struct seccomp_notif_sizes sizes = { > + .seccomp_notif = sizeof(struct seccomp_notif), > + .seccomp_notif_resp = sizeof(struct seccomp_notif_resp), > + .seccomp_data = sizeof(struct seccomp_data), > + }; > + > + if (copy_to_user(usizes, &sizes, sizeof(sizes))) > + return -EFAULT; > + > + return 0; > +} > + > /* Common entry point for both prctl and syscall. */ > static long do_seccomp(unsigned int op, unsigned int flags, > void __user *uargs) > @@ -938,6 +1374,11 @@ static long do_seccomp(unsigned int op, unsigned int flags, > return -EINVAL; > > return seccomp_get_action_avail(uargs); > + case SECCOMP_GET_NOTIF_SIZES: > + if (flags != 0) > + return -EINVAL; > + > + return seccomp_get_notif_sizes(uargs); > default: > return -EINVAL; > } > @@ -1111,6 +1552,7 @@ long seccomp_get_metadata(struct task_struct *task, > #define SECCOMP_RET_KILL_THREAD_NAME "kill_thread" > #define SECCOMP_RET_TRAP_NAME "trap" > #define SECCOMP_RET_ERRNO_NAME "errno" > +#define SECCOMP_RET_USER_NOTIF_NAME "user_notif" > #define SECCOMP_RET_TRACE_NAME "trace" > #define SECCOMP_RET_LOG_NAME "log" > #define SECCOMP_RET_ALLOW_NAME "allow" > @@ -1120,6 +1562,7 @@ static const char seccomp_actions_avail[] = > SECCOMP_RET_KILL_THREAD_NAME " " > SECCOMP_RET_TRAP_NAME " " > SECCOMP_RET_ERRNO_NAME " " > + SECCOMP_RET_USER_NOTIF_NAME " " > SECCOMP_RET_TRACE_NAME " " > SECCOMP_RET_LOG_NAME " " > SECCOMP_RET_ALLOW_NAME; > @@ -1134,6 +1577,7 @@ static const struct seccomp_log_name seccomp_log_names[] = { > { SECCOMP_LOG_KILL_THREAD, SECCOMP_RET_KILL_THREAD_NAME }, > { SECCOMP_LOG_TRAP, SECCOMP_RET_TRAP_NAME }, > { SECCOMP_LOG_ERRNO, SECCOMP_RET_ERRNO_NAME }, > + { SECCOMP_LOG_USER_NOTIF, SECCOMP_RET_USER_NOTIF_NAME }, > { SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME }, > { SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME }, > { SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME }, > diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c > index e1473234968d..5c9768a1b8cd 100644 > --- a/tools/testing/selftests/seccomp/seccomp_bpf.c > +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c > @@ -5,6 +5,7 @@ > * Test code for seccomp bpf. > */ > > +#define _GNU_SOURCE > #include <sys/types.h> > > /* > @@ -40,10 +41,12 @@ > #include <sys/fcntl.h> > #include <sys/mman.h> > #include <sys/times.h> > +#include <sys/socket.h> > +#include <sys/ioctl.h> > > -#define _GNU_SOURCE > #include <unistd.h> > #include <sys/syscall.h> > +#include <poll.h> > > #include "../kselftest_harness.h" > > @@ -133,6 +136,10 @@ struct seccomp_data { > #define SECCOMP_GET_ACTION_AVAIL 2 > #endif > > +#ifndef SECCOMP_GET_NOTIF_SIZES > +#define SECCOMP_GET_NOTIF_SIZES 3 > +#endif > + > #ifndef SECCOMP_FILTER_FLAG_TSYNC > #define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) > #endif > @@ -154,6 +161,44 @@ struct seccomp_metadata { > }; > #endif > > +#ifndef SECCOMP_FILTER_FLAG_NEW_LISTENER > +#define SECCOMP_FILTER_FLAG_NEW_LISTENER (1UL << 3) > + > +#define SECCOMP_RET_USER_NOTIF 0x7fc00000U > + > +#define SECCOMP_IOC_MAGIC '!' > +#define SECCOMP_IO(nr) _IO(SECCOMP_IOC_MAGIC, nr) > +#define SECCOMP_IOR(nr, type) _IOR(SECCOMP_IOC_MAGIC, nr, type) > +#define SECCOMP_IOW(nr, type) _IOW(SECCOMP_IOC_MAGIC, nr, type) > +#define SECCOMP_IOWR(nr, type) _IOWR(SECCOMP_IOC_MAGIC, nr, type) > + > +/* Flags for seccomp notification fd ioctl. */ > +#define SECCOMP_IOCTL_NOTIF_RECV SECCOMP_IOWR(0, struct seccomp_notif) > +#define SECCOMP_IOCTL_NOTIF_SEND SECCOMP_IOWR(1, \ > + struct seccomp_notif_resp) > +#define SECCOMP_IOCTL_NOTIF_ID_VALID SECCOMP_IOR(2, __u64) > + > +struct seccomp_notif { > + __u64 id; > + __u32 pid; > + __u32 flags; > + struct seccomp_data data; > +}; > + > +struct seccomp_notif_resp { > + __u64 id; > + __s64 val; > + __s32 error; > + __u32 flags; > +}; > + > +struct seccomp_notif_sizes { > + __u16 seccomp_notif; > + __u16 seccomp_notif_resp; > + __u16 seccomp_data; > +}; > +#endif > + > #ifndef seccomp > int seccomp(unsigned int op, unsigned int flags, void *args) > { > @@ -2077,7 +2122,8 @@ TEST(detect_seccomp_filter_flags) > { > unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC, > SECCOMP_FILTER_FLAG_LOG, > - SECCOMP_FILTER_FLAG_SPEC_ALLOW }; > + SECCOMP_FILTER_FLAG_SPEC_ALLOW, > + SECCOMP_FILTER_FLAG_NEW_LISTENER }; > unsigned int flag, all_flags; > int i; > long ret; > @@ -2933,6 +2979,403 @@ TEST(get_metadata) > ASSERT_EQ(0, kill(pid, SIGKILL)); > } > > +static int user_trap_syscall(int nr, unsigned int flags) > +{ > + struct sock_filter filter[] = { > + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, > + offsetof(struct seccomp_data, nr)), > + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1), > + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_USER_NOTIF), > + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), > + }; > + > + struct sock_fprog prog = { > + .len = (unsigned short)ARRAY_SIZE(filter), > + .filter = filter, > + }; > + > + return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog); > +} > + > +#define USER_NOTIF_MAGIC 116983961184613L > +TEST(user_notification_basic) > +{ > + pid_t pid; > + long ret; > + int status, listener; > + struct seccomp_notif req = {}; > + struct seccomp_notif_resp resp = {}; > + struct pollfd pollfd; > + > + struct sock_filter filter[] = { > + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), > + }; > + struct sock_fprog prog = { > + .len = (unsigned short)ARRAY_SIZE(filter), > + .filter = filter, > + }; > + > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + /* Check that we get -ENOSYS with no listener attached */ > + if (pid == 0) { > + if (user_trap_syscall(__NR_getpid, 0) < 0) > + exit(1); > + ret = syscall(__NR_getpid); > + exit(ret >= 0 || errno != ENOSYS); > + } > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > + > + /* Add some no-op filters so for grins. */ > + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); > + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); > + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); > + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); > + > + /* Check that the basic notification machinery works */ > + listener = user_trap_syscall(__NR_getpid, > + SECCOMP_FILTER_FLAG_NEW_LISTENER); > + EXPECT_GE(listener, 0); > + > + /* Installing a second listener in the chain should EBUSY */ > + EXPECT_EQ(user_trap_syscall(__NR_getpid, > + SECCOMP_FILTER_FLAG_NEW_LISTENER), > + -1); > + EXPECT_EQ(errno, EBUSY); > + > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + if (pid == 0) { > + ret = syscall(__NR_getpid); > + exit(ret != USER_NOTIF_MAGIC); > + } > + > + pollfd.fd = listener; > + pollfd.events = POLLIN | POLLOUT; > + > + EXPECT_GT(poll(&pollfd, 1, -1), 0); > + EXPECT_EQ(pollfd.revents, POLLIN); > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); > + > + pollfd.fd = listener; > + pollfd.events = POLLIN | POLLOUT; > + > + EXPECT_GT(poll(&pollfd, 1, -1), 0); > + EXPECT_EQ(pollfd.revents, POLLOUT); > + > + EXPECT_EQ(req.data.nr, __NR_getpid); > + > + resp.id = req.id; > + resp.error = 0; > + resp.val = USER_NOTIF_MAGIC; > + > + /* check that we make sure flags == 0 */ > + resp.flags = 1; > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1); > + EXPECT_EQ(errno, EINVAL); > + > + resp.flags = 0; > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > +} > + > +TEST(user_notification_kill_in_middle) > +{ > + pid_t pid; > + long ret; > + int listener; > + struct seccomp_notif req = {}; > + struct seccomp_notif_resp resp = {}; > + > + listener = user_trap_syscall(__NR_getpid, > + SECCOMP_FILTER_FLAG_NEW_LISTENER); > + EXPECT_GE(listener, 0); > + > + /* > + * Check that nothing bad happens when we kill the task in the middle > + * of a syscall. > + */ > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + if (pid == 0) { > + ret = syscall(__NR_getpid); > + exit(ret != USER_NOTIF_MAGIC); > + } > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ID_VALID, &req.id), 0); > + > + EXPECT_EQ(kill(pid, SIGKILL), 0); > + EXPECT_EQ(waitpid(pid, NULL, 0), pid); > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ID_VALID, &req.id), -1); > + > + resp.id = req.id; > + ret = ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp); > + EXPECT_EQ(ret, -1); > + EXPECT_EQ(errno, ENOENT); > +} > + > +static int handled = -1; > + > +static void signal_handler(int signal) > +{ > + if (write(handled, "c", 1) != 1) > + perror("write from signal"); > +} > + > +TEST(user_notification_signal) > +{ > + pid_t pid; > + long ret; > + int status, listener, sk_pair[2]; > + struct seccomp_notif req = {}; > + struct seccomp_notif_resp resp = {}; > + char c; > + > + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); > + > + listener = user_trap_syscall(__NR_gettid, > + SECCOMP_FILTER_FLAG_NEW_LISTENER); > + EXPECT_GE(listener, 0); > + > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + if (pid == 0) { > + close(sk_pair[0]); > + handled = sk_pair[1]; > + if (signal(SIGUSR1, signal_handler) == SIG_ERR) { > + perror("signal"); > + exit(1); > + } > + /* > + * ERESTARTSYS behavior is a bit hard to test, because we need > + * to rely on a signal that has not yet been handled. Let's at > + * least check that the error code gets propagated through, and > + * hope that it doesn't break when there is actually a signal :) > + */ > + ret = syscall(__NR_gettid); > + exit(!(ret == -1 && errno == 512)); > + } > + > + close(sk_pair[1]); > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); > + > + EXPECT_EQ(kill(pid, SIGUSR1), 0); > + > + /* > + * Make sure the signal really is delivered, which means we're not > + * stuck in the user notification code any more and the notification > + * should be dead. > + */ > + EXPECT_EQ(read(sk_pair[0], &c, 1), 1); > + > + resp.id = req.id; > + resp.error = -EPERM; > + resp.val = 0; > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), -1); > + EXPECT_EQ(errno, ENOENT); > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); > + > + resp.id = req.id; > + resp.error = -512; /* -ERESTARTSYS */ > + resp.val = 0; > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > +} > + > +TEST(user_notification_closed_listener) > +{ > + pid_t pid; > + long ret; > + int status, listener; > + > + listener = user_trap_syscall(__NR_getpid, > + SECCOMP_FILTER_FLAG_NEW_LISTENER); > + EXPECT_GE(listener, 0); > + > + /* > + * Check that we get an ENOSYS when the listener is closed. > + */ > + pid = fork(); > + ASSERT_GE(pid, 0); > + if (pid == 0) { > + close(listener); > + ret = syscall(__NR_getpid); > + exit(ret != -1 && errno != ENOSYS); > + } > + > + close(listener); > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > +} > + > +/* > + * Check that a pid in a child namespace still shows up as valid in ours. > + */ > +TEST(user_notification_child_pid_ns) > +{ > + pid_t pid; > + int status, listener; > + struct seccomp_notif req = {}; > + struct seccomp_notif_resp resp = {}; > + > + ASSERT_EQ(unshare(CLONE_NEWPID), 0); > + > + listener = user_trap_syscall(__NR_getpid, SECCOMP_FILTER_FLAG_NEW_LISTENER); > + ASSERT_GE(listener, 0); > + > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + if (pid == 0) > + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); > + EXPECT_EQ(req.pid, pid); > + > + resp.id = req.id; > + resp.error = 0; > + resp.val = USER_NOTIF_MAGIC; > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > + close(listener); > +} > + > +/* > + * Check that a pid in a sibling (i.e. unrelated) namespace shows up as 0, i.e. > + * invalid. > + */ > +TEST(user_notification_sibling_pid_ns) > +{ > + pid_t pid, pid2; > + int status, listener; > + struct seccomp_notif req = {}; > + struct seccomp_notif_resp resp = {}; > + > + listener = user_trap_syscall(__NR_getpid, SECCOMP_FILTER_FLAG_NEW_LISTENER); > + ASSERT_GE(listener, 0); > + > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + if (pid == 0) { > + ASSERT_EQ(unshare(CLONE_NEWPID), 0); > + > + pid2 = fork(); > + ASSERT_GE(pid2, 0); > + > + if (pid2 == 0) > + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); > + > + EXPECT_EQ(waitpid(pid2, &status, 0), pid2); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > + exit(WEXITSTATUS(status)); > + } > + > + /* Create the sibling ns, and sibling in it. */ > + EXPECT_EQ(unshare(CLONE_NEWPID), 0); > + EXPECT_EQ(errno, 0); > + > + pid2 = fork(); > + EXPECT_GE(pid2, 0); > + > + if (pid2 == 0) { > + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); > + /* > + * The pid should be 0, i.e. the task is in some namespace that > + * we can't "see". > + */ > + ASSERT_EQ(req.pid, 0); > + > + resp.id = req.id; > + resp.error = 0; > + resp.val = USER_NOTIF_MAGIC; > + > + ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); > + exit(0); > + } > + > + close(listener); > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > + > + EXPECT_EQ(waitpid(pid2, &status, 0), pid2); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > +} > + > +TEST(user_notification_fault_recv) > +{ > + pid_t pid; > + int status, listener; > + struct seccomp_notif req = {}; > + struct seccomp_notif_resp resp = {}; > + > + listener = user_trap_syscall(__NR_getpid, SECCOMP_FILTER_FLAG_NEW_LISTENER); > + ASSERT_GE(listener, 0); > + > + pid = fork(); > + ASSERT_GE(pid, 0); > + > + if (pid == 0) > + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); > + > + /* Do a bad recv() */ > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, NULL), -1); > + EXPECT_EQ(errno, EFAULT); > + > + /* We should still be able to receive this notification, though. */ > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0); > + EXPECT_EQ(req.pid, pid); > + > + resp.id = req.id; > + resp.error = 0; > + resp.val = USER_NOTIF_MAGIC; > + > + EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0); > + > + EXPECT_EQ(waitpid(pid, &status, 0), pid); > + EXPECT_EQ(true, WIFEXITED(status)); > + EXPECT_EQ(0, WEXITSTATUS(status)); > +} > + > +TEST(seccomp_get_notif_sizes) > +{ > + struct seccomp_notif_sizes sizes; > + > + EXPECT_EQ(seccomp(SECCOMP_GET_NOTIF_SIZES, 0, &sizes), 0); > + EXPECT_EQ(sizes.seccomp_notif, sizeof(struct seccomp_notif)); > + EXPECT_EQ(sizes.seccomp_notif_resp, sizeof(struct seccomp_notif_resp)); > +} > + > /* > * TODO: > * - add microbenchmarks > -- > 2.19.1 > -- Kees Cook _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers