Re: For review: seccomp_user_notif(2) manual page

Christian Brauner <christian.brauner@xxxxxxxxxxxxx> · Thu, 1 Oct 2020 19:05:43 +0200

On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn wrote:
> On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner
> <christian.brauner@xxxxxxxxxxxxx> wrote:
> > On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote:
> > > On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages)
> > > <mtk.manpages@xxxxxxxxx> wrote:
> > > > NOTES
> > > >        The file descriptor returned when seccomp(2) is employed with the
> > > >        SECCOMP_FILTER_FLAG_NEW_LISTENER  flag  can  be  monitored  using
> > > >        poll(2), epoll(7), and select(2).  When a notification  is  pend‐
> > > >        ing,  these interfaces indicate that the file descriptor is read‐
> > > >        able.
> > >
> > > We should probably also point out somewhere that, as
> > > include/uapi/linux/seccomp.h says:
> > >
> > >  * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF
> > >  * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the
> > >  * same syscall, the most recently added filter takes precedence. This means
> > >  * that the new SECCOMP_RET_USER_NOTIF filter can override any
> > >  * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all
> > >  * such filtered syscalls to be executed by sending the response
> > >  * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally
> > >  * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE.
> > >
> > > In other words, from a security perspective, you must assume that the
> > > target process can bypass any SECCOMP_RET_USER_NOTIF (or
> > > SECCOMP_RET_TRACE) filters unless it is completely prohibited from
> > > calling seccomp(). This should also be noted over in the main
> > > seccomp(2) manpage, especially the SECCOMP_RET_TRACE part.
> >
> > So I was actually wondering about this when I skimmed this and a while
> > ago but forgot about this again... Afaict, you can only ever load a
> > single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there
> > already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property
> > in the tasks filter hierarchy then the kernel will refuse to load a new
> > one?
> >
> > static struct file *init_listener(struct seccomp_filter *filter)
> > {
> >         struct file *ret = ERR_PTR(-EBUSY);
> >         struct seccomp_filter *cur;
> >
> >         for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> >                 if (cur->notif)
> >                         goto out;
> >         }
> >
> > shouldn't that be sufficient to guarantee that USER_NOTIF filters can't
> > override each other for the same task simply because there can only ever
> > be a single one?
> 
> Good point. Exceeeept that that check seems ineffective because this
> happens before we take the locks that guard against TSYNC, and also
> before we decide to which existing filter we want to chain the new
> filter. So if two threads race with TSYNC, I think they'll be able to
> chain two filters with listeners together.

That's a bug, imho. I don't have source code in front of me right now
though.

> 
> I don't know whether we want to eternalize this "only one listener
> across all the filters" restriction in the manpage though, or whether
> the man page should just say that the kernel currently doesn't support
> it but that security-wise you should assume that it might at some
> point.

Maybe. I would argue that it might be worth having at least a new
flag/option to indicate either "This is a non-overridable filter." or at
least for the seccomp notifier have an option to indicate that no other
notifer can be installed.

Christian