On Thu, May 7, 2009 at 00:03, Roland McGrath <roland@xxxxxxxxxx> wrote: > > That is not a "ptrace problem" per se at all. It's an intrinsic problem > with any method based on "generic" syscall interception, if the filtering > and enforcement decisions depend on examining user memory. Yes, this is indeed the main problem that we are aware of. It can be avoided by suspending all threads during user memory inspection, but that's a horrible price to pay (also: see below for an alternative approach, that could in principle be adapted to use with ptrace) > The only reason seccomp does not have this "reliability problem" is that > its filtering is trivial and depends only on registers (in fact, only on > one register, the syscall number). Simplicity is really the beauty of seccomp. It is very easy to verify that it does the right thing from a security point of view, because any attempt to call unsafe system calls results in the kernel terminating the program. This is much preferable over most ptrace solutions which is more difficult to audit for correctness. The downside is that the sandbox'd code needs to delegate execution of most of its system calls to a monitor process. This is slow and rather awkward. Although due to the magic of clone(), (almost) all system calls can in fact be serialized, sent to the monitor process, have their arguments safely inspected, and then executed on behalf of the sandbox'd process. Details are tedious but we believe they are solvable with current kernel APIs. The other issue is performance. For system calls that are known to be safe, we would rather not pay the penalty of redirecting them. A kernel patch that made seccomp more efficient for these system calls would be very welcome, and we will post such a patch for discussion shortly. > If you want to do checks that depend on shared or volatile state, then > syscall interception is really not the proper mechanism for you. We agree that syscall interception is a poor abstraction level for a sandbox. But in the short term, we need to work with the APIs that are available in today's kernels. And we believe that seccomp is one of the more promising API that are currently available to us. Markus