On Mon, Sep 7, 2020 at 7:25 AM Christian Brauner <christian.brauner@xxxxxxxxxx> wrote: > > On Mon, Sep 07, 2020 at 07:15:52AM -0700, Andy Lutomirski wrote: > > > > > > > On Sep 7, 2020, at 3:15 AM, Christian Brauner <christian.brauner@xxxxxxxxxx> wrote: > > > > > > On Fri, Sep 04, 2020 at 04:31:44PM -0400, Gabriel Krisman Bertazi wrote: > > >> Syscall User Dispatch (SUD) must take precedence over seccomp, since the > > >> use case is emulation (it can be invoked with a different ABI) such that > > >> seccomp filtering by syscall number doesn't make sense in the first > > >> place. In addition, either the syscall is dispatched back to userspace, > > >> in which case there is no resource for seccomp to protect, or the > > > > > > Tbh, I'm torn here. I'm not a super clever attacker but it feels to me > > > that this is still at least a clever way to circumvent a seccomp > > > sandbox. > > > If I'd be confined by a seccomp profile that would cause me to be > > > SIGKILLed when I try do open() I could prctl() myself to do user > > > dispatch to prevent that from happening, no? > > > > > > > Not really, I think. The idea is that you didn’t actually do open(). > > You did a SYSCALL instruction which meant something else, and the > > syscall dispatch correctly prevented the kernel from misinterpreting > > it as open(). > > Right, for the case where you're e.g. emulating windows syscalls that's > true. I was thinking when you're running natively on Linux: couldn't I > first load a seccomp profile "kill me if someone does an open()", then > I exec() the target binary and that binary is setup to do > prctl(USER_DISPATCH) first thing. I guess, it's ok because as far as I > had time to read it this is a nothing or all mechanism, i.e. _all_ > system calls are re-routed in contrast to e.g. seccomp where I could do > this per-syscall. So for user-dispatch it wouldn't make sense to use it > on Linux per se. Still makes me a little uneasy. :) There's an escape hatch, so processes using this can still make syscalls. Maybe think about it another way: a process using user dispatch should definitely *not* trigger seccomp user notifiers, errno returns, or ptrace events, since they'll all do the wrong thing. IMO RET_KILL is the same. Barring some very severe defect, there's no way a program can use user dispatch to escape seccomp -- a program could use user dispatch to allow them to do: mov $__NR_open, %rax syscall without dying despite the presence of a filter that would kill the process if it tried to do open(), but this doesn't bypass the filter at all. The process could just as easily have done: mov $__NR_open jmp magic_stub(%rip) without tripping the filter, since no system call actually happens here. --Andy