On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn wrote: > On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner > <christian.brauner@xxxxxxxxxxxxx> wrote: > > On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote: > > > On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages) > > > <mtk.manpages@xxxxxxxxx> wrote: > > > > NOTES > > > > The file descriptor returned when seccomp(2) is employed with the > > > > SECCOMP_FILTER_FLAG_NEW_LISTENER flag can be monitored using > > > > poll(2), epoll(7), and select(2). When a notification is pend‐ > > > > ing, these interfaces indicate that the file descriptor is read‐ > > > > able. > > > > > > We should probably also point out somewhere that, as > > > include/uapi/linux/seccomp.h says: > > > > > > * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF > > > * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the > > > * same syscall, the most recently added filter takes precedence. This means > > > * that the new SECCOMP_RET_USER_NOTIF filter can override any > > > * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all > > > * such filtered syscalls to be executed by sending the response > > > * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally > > > * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. > > > > > > In other words, from a security perspective, you must assume that the > > > target process can bypass any SECCOMP_RET_USER_NOTIF (or > > > SECCOMP_RET_TRACE) filters unless it is completely prohibited from > > > calling seccomp(). This should also be noted over in the main > > > seccomp(2) manpage, especially the SECCOMP_RET_TRACE part. > > > > So I was actually wondering about this when I skimmed this and a while > > ago but forgot about this again... Afaict, you can only ever load a > > single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there > > already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property > > in the tasks filter hierarchy then the kernel will refuse to load a new > > one? > > > > static struct file *init_listener(struct seccomp_filter *filter) > > { > > struct file *ret = ERR_PTR(-EBUSY); > > struct seccomp_filter *cur; > > > > for (cur = current->seccomp.filter; cur; cur = cur->prev) { > > if (cur->notif) > > goto out; > > } > > > > shouldn't that be sufficient to guarantee that USER_NOTIF filters can't > > override each other for the same task simply because there can only ever > > be a single one? > > Good point. Exceeeept that that check seems ineffective because this > happens before we take the locks that guard against TSYNC, and also > before we decide to which existing filter we want to chain the new > filter. So if two threads race with TSYNC, I think they'll be able to > chain two filters with listeners together. That's a bug, imho. I don't have source code in front of me right now though. > > I don't know whether we want to eternalize this "only one listener > across all the filters" restriction in the manpage though, or whether > the man page should just say that the kernel currently doesn't support > it but that security-wise you should assume that it might at some > point. Maybe. I would argue that it might be worth having at least a new flag/option to indicate either "This is a non-overridable filter." or at least for the seccomp notifier have an option to indicate that no other notifer can be installed. Christian