On 10/1/20 7:12 PM, Christian Brauner wrote: > On Thu, Oct 01, 2020 at 10:58:50AM -0600, Tycho Andersen wrote: >> On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn via Containers wrote: >>> On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner >>> <christian.brauner@xxxxxxxxxxxxx> wrote: >>>> On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote: >>>>> On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages) >>>>> <mtk.manpages@xxxxxxxxx> wrote: >>>>>> NOTES >>>>>> The file descriptor returned when seccomp(2) is employed with the >>>>>> SECCOMP_FILTER_FLAG_NEW_LISTENER flag can be monitored using >>>>>> poll(2), epoll(7), and select(2). When a notification is pend‐ >>>>>> ing, these interfaces indicate that the file descriptor is read‐ >>>>>> able. >>>>> >>>>> We should probably also point out somewhere that, as >>>>> include/uapi/linux/seccomp.h says: >>>>> >>>>> * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF >>>>> * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the >>>>> * same syscall, the most recently added filter takes precedence. This means >>>>> * that the new SECCOMP_RET_USER_NOTIF filter can override any >>>>> * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all >>>>> * such filtered syscalls to be executed by sending the response >>>>> * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally >>>>> * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE. >>>>> >>>>> In other words, from a security perspective, you must assume that the >>>>> target process can bypass any SECCOMP_RET_USER_NOTIF (or >>>>> SECCOMP_RET_TRACE) filters unless it is completely prohibited from >>>>> calling seccomp(). This should also be noted over in the main >>>>> seccomp(2) manpage, especially the SECCOMP_RET_TRACE part. >>>> >>>> So I was actually wondering about this when I skimmed this and a while >>>> ago but forgot about this again... Afaict, you can only ever load a >>>> single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there >>>> already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property >>>> in the tasks filter hierarchy then the kernel will refuse to load a new >>>> one? >>>> >>>> static struct file *init_listener(struct seccomp_filter *filter) >>>> { >>>> struct file *ret = ERR_PTR(-EBUSY); >>>> struct seccomp_filter *cur; >>>> >>>> for (cur = current->seccomp.filter; cur; cur = cur->prev) { >>>> if (cur->notif) >>>> goto out; >>>> } >>>> >>>> shouldn't that be sufficient to guarantee that USER_NOTIF filters can't >>>> override each other for the same task simply because there can only ever >>>> be a single one? >>> >>> Good point. Exceeeept that that check seems ineffective because this >>> happens before we take the locks that guard against TSYNC, and also >>> before we decide to which existing filter we want to chain the new >>> filter. So if two threads race with TSYNC, I think they'll be able to >>> chain two filters with listeners together. >> >> Yep, seems the check needs to also be in seccomp_can_sync_threads() to >> be totally effective, >> >>> I don't know whether we want to eternalize this "only one listener >>> across all the filters" restriction in the manpage though, or whether >>> the man page should just say that the kernel currently doesn't support >>> it but that security-wise you should assume that it might at some >>> point. >> >> This requirement originally came from Andy, arguing that the semantics >> of this were/are confusing, which still makes sense to me. Perhaps we >> should do something like the below? > > I think we should either keep up this restriction and then cement it in > the manpage or add a flag to indicate that the notifier is > non-overridable. > I don't care about the default too much, i.e. whether it's overridable > by default and exclusive if opting in or the other way around doesn't > matter too much. But from a supervisor's perspective it'd be quite nice > to be able to be sure that a notifier can't be overriden by another > notifier. > > I think having a flag would provide the greatest flexibility but I agree > that the semantics of multiple listeners are kinda odd. So, for now, I have applied the patch at the foot of this mail to the pages. Does this seem correct? > Below looks sane to me though again, I'm not sitting in fron of source > code. [...] Thanks, Michael PS Jann, if you see this, I'm still working through your (extensive and very helpful) review comments. I will be sending a response. ====== diff --git a/man2/seccomp.2 b/man2/seccomp.2 index 9ab07f4ab..45a6984df 100644 --- a/man2/seccomp.2 +++ b/man2/seccomp.2 @@ -221,6 +221,11 @@ return a new user-space notification file descriptor. When the filter returns .BR SECCOMP_RET_USER_NOTIF a notification will be sent to this file descriptor. +.IP +At most one seccomp filter using the +.BR SECCOMP_FILTER_FLAG_NEW_LISTENER +flag can be installed for a thread. +.IP See .BR seccomp_user_notif (2) for further details. @@ -789,6 +794,12 @@ capability in its user namespace, or had not set before using .BR SECCOMP_SET_MODE_FILTER . .TP +.BR EBUSY +While installing a new filter, the +.BR SECCOMP_FILTER_FLAG_NEW_LISTENER +flag was specified, +but a previous filter had already been installed with that flag. +.TP .BR EFAULT .IR args was not a valid address. diff --git a/man2/seccomp_user_notif.2 b/man2/seccomp_user_notif.2 index a6025e4d4..d1a406f46 100644 --- a/man2/seccomp_user_notif.2 +++ b/man2/seccomp_user_notif.2 @@ -92,6 +92,7 @@ Consequently, the return value of the (successful) .BR seccomp (2) call is a new "listening" file descriptor that can be used to receive notifications. +Only one such "listener" can be established. .IP \(bu In cases where it is appropriate, the seccomp filter returns the action value .BR SECCOMP_RET_USER_NOTIF . -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/