Re: bind mount that lies to apps about st_uid?

Kees Cook <keescook@xxxxxxxxxxxx> · Thu, 16 Jul 2015 10:59:37 -0700

On Tue, Jul 14, 2015 at 6:40 PM, Kenton Varda <kenton@xxxxxxxxxxxx> wrote:
> On Tue, Jul 14, 2015 at 6:24 PM, Andrew Lutomirski <andy@xxxxxxx> wrote:
>> On Tue, Jul 14, 2015 at 6:14 PM, Kenton Varda <kenton@xxxxxxxxxxxx> wrote:
>>> Note that we would *love* the ability to more cleanly and robustly
>>> filter / rewrite syscalls in userspace, for exactly this sort of
>>> thing. But it seems the available options (ptrace and LD_PRELOAD) are
>>> far too difficult and quirky to use effectively as-is.
>>
>> The latter is very much on my todo list.  The performance will suck
>> less than ptrace, but it still won't be fantastic.  It'll be extra
>> tricky here because you'll be accessing user memory, too.
>>
>> Kees, have you or any of the Chromium people played with better
>> trapping mechanisms?  There's my awful patch set to help enable
>> something better, but I've tabled it until the x86 entry cleanups are
>> done, because it's just too messy right now.

Not yet, no. So far, using the seccomp ptrace hook
(PTRACE_O_TRACESECCOMP) is how we trap weird stuff. It's actually
pretty decent as long as you're not doing it A LOT. :) The downside is
that changing syscall return values requires per-arch register
manipulation, but once that's written, it's not going to change. (The
seccomp regression test suite already has all this, FWIW.)

> For our purposes, I think I really just want to convert `syscall`
> instructions into regular old calls to some well-defined address where
> I can put my filtering code. I'm fine with the filter being in
> userspace where the process can attack it because the purpose of the
> filter is only compatibility fix-ups, not security enforcement.
>
> seccomp-bpf can basically do this by raising SIGSYS, but it seems like
> things break down when you get into details. E.g. you need the filter
> itself to be able to make syscalls, so you need seccomp to *not* block
> the syscall instructions made by the filter, which seems to require
> that the filter code be loaded in a reliable location in memory, which
> is kind of impossible. Also, performance.

Performance will likely be the limiting factor, but I'm curious about
using a ptrace manager and only catching SECCOMP_RET_TRACE seccomp
filter actions. Is it only stat you need to manipulate? You can catch
the syscall entry with seccomp (no need for SIGSYS), then catch the
syscall exit with PTRACE_SYSCALL, check the results, rewrite them, and
PTRACE_CONT? Are you needing to catch stat really often (or more than
just stat)?

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html