Re: [kernel-hardening] [RFC PATCH 1/1] seccomp: provide information about the previous syscall

Jann Horn <jann@xxxxxxxxx> · Fri, 22 Jan 2016 11:48:37 +0100

On Fri, Jan 22, 2016 at 03:30:00PM +0900, Daniel Sangorrin wrote:
> This patch allows applications to restrict the order in which
> its system calls may be requested. In order to do that, we
> provide seccomp-BPF scripts with information about the
> previous system call requested.
>
> An example use case consists of detecting (and stopping) return
> oriented attacks that disturb the normal execution flow of
> a user program.

The intent here is to mitigate attacks in which an attacker has
e.g. a function pointer overwrite without a high degree of stack
control or the ability to perform a stack pivot, correct? So that
e.g. a one-gadget system() call won't succeed?

Do you have data on how effective this protection is using just
the previous system call number?

I think that for example, the "magic ROP gadget" in glibc that
can be used given just a single pointer overwrite and stdin
control (https://gist.github.com/zachriggle/ca24daf4e8be953a3f96),
which (as far as I can tell) is in the middle of the system()
implementation, could be used as long as a transition to one of
the following syscalls is allowed:

 - rt_sigaction
 - rt_sigprocmask
 - clone
 - execve

I'm not sure how many interesting syscalls typically transition
to that, perhaps you can comment on that?

However, when exploiting network servers, this magic gadget
won't help much - an attacker would probably have to either
call into an interesting function in the application or use
ROP. In the latter case, this protection won't help much -
especially considering that most syscalls just return
-EFAULT / -EINVAL when you supply nonsense arguments, ROPping
through a "pop rax;ret" gadget and a "syscall;ret" gadget
should make it fairly easy to bypass the protection. There
are a bunch of occurences of both gadgets in Debian's libc
(and these are just the trivial ones):

$ hexdump -C /lib/x86_64-linux-gnu/libc-2.19.so | grep '58 c3'
000382e0  00 00 48 8b 00 5b 8b 40  58 c3 48 8d 05 4f 8a 36  |..H..[.@X.H..O.6|
000383b0  58 c3 48 8d 05 87 89 36  00 48 39 c3 74 0e 48 89  |X.H....6.H9.t.H.|
00038450  40 58 c3 48 8d 05 e6 88  36 00 48 39 c3 74 0e 48  |@X.H....6.H9.t.H|
000d9a00  48 89 44 24 18 e8 56 ff  ff ff 48 83 c4 58 c3 90  |H.D$..V...H..X..|
000e51d0  c3 0f 1f 80 00 00 00 00  48 8b 40 58 c3 0f 1f 00  |........H.@X....|
000ea2f0  48 83 3d 58 c3 2b 00 00  48 8b 1d 69 8b 2b 00 64  |H.=X.+..H..i.+.d|
00160520  48 c3 fa ff 58 c3 fa ff  68 c3 fa ff 80 c3 fa ff  |H...X...h.......|
00171470  58 c3 f8 ff 84 60 02 00  74 c3 f8 ff 94 62 02 00  |X....`..t....b..|
$ hexdump -C /lib/x86_64-linux-gnu/libc-2.19.so | grep '0f 05 c3'
000b85b0  b8 6e 00 00 00 0f 05 c3  0f 1f 84 00 00 00 00 00  |.n..............|
000b85c0  b8 66 00 00 00 0f 05 c3  0f 1f 84 00 00 00 00 00  |.f..............|
000b85d0  b8 6b 00 00 00 0f 05 c3  0f 1f 84 00 00 00 00 00  |.k..............|
000b85e0  b8 68 00 00 00 0f 05 c3  0f 1f 84 00 00 00 00 00  |.h..............|
000b85f0  b8 6c 00 00 00 0f 05 c3  0f 1f 84 00 00 00 00 00  |.l..............|
000b87f0  b8 6f 00 00 00 0f 05 c3  0f 1f 84 00 00 00 00 00  |.o..............|
000d9260  b8 5f 00 00 00 0f 05 c3  0f 1f 84 00 00 00 00 00  |._..............|
000e6400  b8 e4 00 00 00 0f 05 c3  0f 1f 84 00 00 00 00 00  |................|
000fff60  48 63 3f b8 03 00 00 00  0f 05 c3 0f 1f 44 00 00  |Hc?..........D..|

So an attacker would craft the stack like this:
[pop rax;ret address]
[first syscall for transition]
[syscall;ret address]
[pop rax;ret address]
[second syscall for transition]
[syscall;ret address]
[...]
[normal ROP for whatever the attacker wants to do]

Maybe someone who knows a bit more about binary exploiting
can comment on this, especially how likely it is that a
manipulation of a network service's program flow is successful
in the presence of full ASLR and so on without ROP.

Also, there is a potential functional issue: What about signal handlers?
Signal handlers will require transitions from all syscalls to any syscall
that occurs at the start of a signal handler to be allowed as far as I
can tell.

> @@ -443,6 +448,11 @@ static long seccomp_attach_filter(unsigned int flags,
>  			return ret;
>  	}
>  
> +	/* Initialize the prev_nr field only once */
> +	if (current->seccomp.filter == NULL)
> +		current->seccomp.prev_nr =
> +			syscall_get_nr(current, task_pt_regs(current));
> +
>  	/*
>  	 * If there is an existing filter, make it the prev and don't drop its
>  	 * task reference.

What about SECCOMP_FILTER_FLAG_TSYNC? When a thread is transitioned from
SECCOMP_MODE_DISABLED to SECCOMP_MODE_FILTER by another thread, its initial
prev_nr will be 0, which would e.g. appear to be the read() syscall on
x86_64, right?
Attachment:
signature.asc

Description: Digital signature