On Wed, Oct 10, 2018 at 05:39:57PM +0200, Christian Brauner wrote: > On Wed, Oct 10, 2018 at 05:33:43PM +0200, Jann Horn wrote: > > On Wed, Oct 10, 2018 at 5:32 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > > > On Tue, Oct 9, 2018 at 9:36 AM Jann Horn <jannh@xxxxxxxxxx> wrote: > > > > +cc selinux people explicitly, since they probably have opinions on this > > > > > > I just spent about twenty minutes working my way through this thread, > > > and digging through the containers archive trying to get a good > > > understanding of what you guys are trying to do, and I'm not quite > > > sure I understand it all. However, from what I have seen, this > > > approach looks very ptrace-y to me (I imagine to others as well based > > > on the comments) and because of this I think ensuring the usual ptrace > > > access controls are evaluated, including the ptrace LSM hooks, is the > > > right thing to do. > > > > Basically the problem is that this new ptrace() API does something > > that doesn't just influence the target task, but also every other task > > that has the same seccomp filter. So the classic ptrace check doesn't > > work here. > > Just to throw this into the mix: then maybe ptrace() isn't the right > interface and we should just go with the native seccomp() approach for > now. Please no :). I don't buy your arguments that 3-syscalls vs. one is better. If I'm doing this setup with a new container, I have to do clone(CLONE_FILES), do this seccomp thing, so that my parent can pick it up again, then do another clone without CLONE_FILES, because in the general case I don't want to share my fd table with the container, wait on the middle task for errors, etc. So we're still doing a bunch of setup, and it feels more awkward than ptrace, with at least as many syscalls, and it only works for your children. I don't mind leaving capable(CAP_SYS_ADMIN) for the ptrace() part, though. So if that's ok, then I think we can agree :) Tycho