Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re: [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 17 Jan 2012 22:25:32 -0800

On Tue, Jan 17, 2012 at 9:23 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
>  - in that page, do this:
>
>      lea 1f,%edx
>      movl $SYSCALL,%eax
>      movl $-1,4096(%edx)
>  1:
>      int 0x80
>
> and what happens is that the move that *overwrites* the int 0x80 will
> not be noticed by the I$ coherency because it's at another address,
> but by the time you read at $pc-2, you'll get -1, not "int 0x80"

Btw, that's I$ coherency comment is not technically the correct explanation.

The I$ coherency isn't the problem, the problem is that the pipeline
has already fetched the "int 0x80" before the write happens. And the
write - because it's not to the same linear address as the code fetch
- won't trigger the internal "pipeline flush on write to code stream".
So the D$ (and I$) will have the -1 in it, but the instruction fetch
will have walked ahead and seen the "int 80" that existed earlier, and
will execute it.

And the above depends very much on uarch details, so depending on
microarchitecture it may or may not work. But I think the "use a
different virtual address, but same physical address" thing will fake
out all modern x86 cpu's, and your 'ptrace' will see the -1, even
though the system call happened.

Anyway, the *kernel* knows, since the kernel will have seen which
entrypoint it comes through. So we can handle it in the kernel. But
no, you cannot currently securely/reliably use $pc-2 in gdb or ptrace
to determine how the system call was made, afaik.

Of course, limiting things so that you cannot map the same page
executably *and* writably is one solution - and a good idea regardless
- so secure environments can still exist. But even then you could have
races in a multi-threaded environment (they'd just be *much* harder to
trigger for an attacker).

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html