On Fri, Jan 22, 2016 at 03:30:00PM +0900, Daniel Sangorrin wrote: > This patch allows applications to restrict the order in which > its system calls may be requested. In order to do that, we > provide seccomp-BPF scripts with information about the > previous system call requested. > > An example use case consists of detecting (and stopping) return > oriented attacks that disturb the normal execution flow of > a user program. The intent here is to mitigate attacks in which an attacker has e.g. a function pointer overwrite without a high degree of stack control or the ability to perform a stack pivot, correct? So that e.g. a one-gadget system() call won't succeed? Do you have data on how effective this protection is using just the previous system call number? I think that for example, the "magic ROP gadget" in glibc that can be used given just a single pointer overwrite and stdin control (https://gist.github.com/zachriggle/ca24daf4e8be953a3f96), which (as far as I can tell) is in the middle of the system() implementation, could be used as long as a transition to one of the following syscalls is allowed: - rt_sigaction - rt_sigprocmask - clone - execve I'm not sure how many interesting syscalls typically transition to that, perhaps you can comment on that? However, when exploiting network servers, this magic gadget won't help much - an attacker would probably have to either call into an interesting function in the application or use ROP. In the latter case, this protection won't help much - especially considering that most syscalls just return -EFAULT / -EINVAL when you supply nonsense arguments, ROPping through a "pop rax;ret" gadget and a "syscall;ret" gadget should make it fairly easy to bypass the protection. There are a bunch of occurences of both gadgets in Debian's libc (and these are just the trivial ones): $ hexdump -C /lib/x86_64-linux-gnu/libc-2.19.so | grep '58 c3' 000382e0 00 00 48 8b 00 5b 8b 40 58 c3 48 8d 05 4f 8a 36 |..H..[.@X.H..O.6| 000383b0 58 c3 48 8d 05 87 89 36 00 48 39 c3 74 0e 48 89 |X.H....6.H9.t.H.| 00038450 40 58 c3 48 8d 05 e6 88 36 00 48 39 c3 74 0e 48 |@X.H....6.H9.t.H| 000d9a00 48 89 44 24 18 e8 56 ff ff ff 48 83 c4 58 c3 90 |H.D$..V...H..X..| 000e51d0 c3 0f 1f 80 00 00 00 00 48 8b 40 58 c3 0f 1f 00 |........H.@X....| 000ea2f0 48 83 3d 58 c3 2b 00 00 48 8b 1d 69 8b 2b 00 64 |H.=X.+..H..i.+.d| 00160520 48 c3 fa ff 58 c3 fa ff 68 c3 fa ff 80 c3 fa ff |H...X...h.......| 00171470 58 c3 f8 ff 84 60 02 00 74 c3 f8 ff 94 62 02 00 |X....`..t....b..| $ hexdump -C /lib/x86_64-linux-gnu/libc-2.19.so | grep '0f 05 c3' 000b85b0 b8 6e 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.n..............| 000b85c0 b8 66 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.f..............| 000b85d0 b8 6b 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.k..............| 000b85e0 b8 68 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.h..............| 000b85f0 b8 6c 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.l..............| 000b87f0 b8 6f 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |.o..............| 000d9260 b8 5f 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |._..............| 000e6400 b8 e4 00 00 00 0f 05 c3 0f 1f 84 00 00 00 00 00 |................| 000fff60 48 63 3f b8 03 00 00 00 0f 05 c3 0f 1f 44 00 00 |Hc?..........D..| So an attacker would craft the stack like this: [pop rax;ret address] [first syscall for transition] [syscall;ret address] [pop rax;ret address] [second syscall for transition] [syscall;ret address] [...] [normal ROP for whatever the attacker wants to do] Maybe someone who knows a bit more about binary exploiting can comment on this, especially how likely it is that a manipulation of a network service's program flow is successful in the presence of full ASLR and so on without ROP. Also, there is a potential functional issue: What about signal handlers? Signal handlers will require transitions from all syscalls to any syscall that occurs at the start of a signal handler to be allowed as far as I can tell. > @@ -443,6 +448,11 @@ static long seccomp_attach_filter(unsigned int flags, > return ret; > } > > + /* Initialize the prev_nr field only once */ > + if (current->seccomp.filter == NULL) > + current->seccomp.prev_nr = > + syscall_get_nr(current, task_pt_regs(current)); > + > /* > * If there is an existing filter, make it the prev and don't drop its > * task reference. What about SECCOMP_FILTER_FLAG_TSYNC? When a thread is transitioned from SECCOMP_MODE_DISABLED to SECCOMP_MODE_FILTER by another thread, its initial prev_nr will be 0, which would e.g. appear to be the read() syscall on x86_64, right?
Attachment:
signature.asc
Description: Digital signature