[PATCH 00/14] run seccomp after ptrace

Kees Cook <keescook@xxxxxxxxxxxx> · Thu, 9 Jun 2016 14:01:50 -0700

There has been a long-standing (and documented) issue with seccomp
where ptrace can be used to change a syscall out from under seccomp.
This is a problem for containers and other wider seccomp filtered
environments where ptrace needs to remain available, as it allows
for an escape of the seccomp filter.

Since the ptrace attack surface is available for any allowed syscall,
moving seccomp after ptrace doesn't increase the actually available
attack surface. And this actually improves tracing since, for
example, tracers will be notified of syscall entry before seccomp
sends a SIGSYS, which makes debugging filters much easier.

The per-architecture changes do make one (hopefully small)
semantic change, which is that since ptrace comes first, it may
request a syscall be skipped. Running seccomp after this doesn't
make sense, so if ptrace wants to skip a syscall, it will bail
out early similarly to how seccomp was. This means that skipped
syscalls will not be fed through audit, though that likely means
we're actually avoiding noise this way.

This series first cleans up seccomp to remove the now unneeded
two-phase entry, fixes the SECCOMP_RET_TRACE hole (same as the
ptrace hole above), and then reorders seccomp after ptrace on
each architecture.

Thanks,

-Kees
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html