Re: [PATCH v7 3/6] seccomp: add a way to get a listener fd from ptrace

Tycho Andersen <tycho@xxxxxxxx> · Wed, 10 Oct 2018 10:26:22 -0700

On Wed, Oct 10, 2018 at 07:15:02PM +0200, Christian Brauner wrote:
> On Wed, Oct 10, 2018 at 09:54:58AM -0700, Tycho Andersen wrote:
> > On Wed, Oct 10, 2018 at 05:39:57PM +0200, Christian Brauner wrote:
> > > On Wed, Oct 10, 2018 at 05:33:43PM +0200, Jann Horn wrote:
> > > > On Wed, Oct 10, 2018 at 5:32 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> > > > > On Tue, Oct 9, 2018 at 9:36 AM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > > > > > +cc selinux people explicitly, since they probably have opinions on this
> > > > >
> > > > > I just spent about twenty minutes working my way through this thread,
> > > > > and digging through the containers archive trying to get a good
> > > > > understanding of what you guys are trying to do, and I'm not quite
> > > > > sure I understand it all.  However, from what I have seen, this
> > > > > approach looks very ptrace-y to me (I imagine to others as well based
> > > > > on the comments) and because of this I think ensuring the usual ptrace
> > > > > access controls are evaluated, including the ptrace LSM hooks, is the
> > > > > right thing to do.
> > > > 
> > > > Basically the problem is that this new ptrace() API does something
> > > > that doesn't just influence the target task, but also every other task
> > > > that has the same seccomp filter. So the classic ptrace check doesn't
> > > > work here.
> > > 
> > > Just to throw this into the mix: then maybe ptrace() isn't the right
> > > interface and we should just go with the native seccomp() approach for
> > > now.
> > 
> > Please no :).
> > 
> > I don't buy your arguments that 3-syscalls vs. one is better. If I'm
> > doing this setup with a new container, I have to do
> > clone(CLONE_FILES), do this seccomp thing, so that my parent can pick
> > it up again, then do another clone without CLONE_FILES, because in the
> > general case I don't want to share my fd table with the container,
> > wait on the middle task for errors, etc. So we're still doing a bunch
> > of setup, and it feels more awkward than ptrace, with at least as many
> > syscalls, and it only works for your children.
> 
> You're talking about the case where you already have shot yourself in
> the foot by blocking basically all other sensible ways of getting the fd
> out.

Ok, but these other ways involve syscalls too (sendmsg() or whatever).
And if you're going to allow arbitrary policy from your users, you
have to be maximally flexible.

> Also, this was meant to show that parts of your initial justification
> for implementing the ptrace() way of getting an fd doesn't really stand.
> And it doesn't really. Even with ptrace() you can get into situations
> where you're not able to get an fd. (see prior threads)

Of course. I guess my point was that we shouldn't design an API that's
impossible to use. I'll drop the notes about sendmsg() from the commit
message.

Tycho