Re: [PATCH v2 2/2] Documentation: prctl/seccomp_filter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hrm, I may need to guard sample compilation based on host arch and not
just target arch. Documentation v3 will be on the way once I have that
behaving properly. :/

Sorry!
will

On Wed, Jan 11, 2012 at 5:19 PM, Will Drewry <wad@xxxxxxxxxxxx> wrote:
> Document how system call filtering with BPF works and
> may be used.  Includes an example for x86 (32-bit).
>
> Signed-off-by: Will Drewry <wad@xxxxxxxxxxxx>
> ---
>  Documentation/prctl/seccomp_filter.txt |   99 ++++++++++++++++++++++++++++++++
>  samples/Makefile                       |    2 +-
>  samples/seccomp/Makefile               |   12 ++++
>  samples/seccomp/bpf-example.c          |   74 ++++++++++++++++++++++++
>  4 files changed, 186 insertions(+), 1 deletions(-)
>  create mode 100644 Documentation/prctl/seccomp_filter.txt
>  create mode 100644 samples/seccomp/Makefile
>  create mode 100644 samples/seccomp/bpf-example.c
>
> diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
> new file mode 100644
> index 0000000..15d4645
> --- /dev/null
> +++ b/Documentation/prctl/seccomp_filter.txt
> @@ -0,0 +1,99 @@
> +               Seccomp filtering
> +               =================
> +
> +Introduction
> +------------
> +
> +A large number of system calls are exposed to every userland process
> +with many of them going unused for the entire lifetime of the process.
> +As system calls change and mature, bugs are found and eradicated.  A
> +certain subset of userland applications benefit by having a reduced set
> +of available system calls.  The resulting set reduces the total kernel
> +surface exposed to the application.  System call filtering is meant for
> +use with those applications.
> +
> +Seccomp filtering provides a means for a process to specify a filter
> +for incoming system calls.  The filter is expressed as a Berkeley Packet
> +Filter program, as with socket filters, except that the data operated on
> +is the current user_regs_struct.  This allows for expressive filtering
> +of system calls using the pre-existing system call ABI and using a filter
> +program language with a long history of being exposed to userland.
> +Additionally, BPF makes it impossible for users of seccomp to fall prey to
> +time-of-check-time-of-use (TOCTOU) attacks that are common in system call
> +interposition frameworks because the evaluated data is solely register state
> +just after system call entry.
> +
> +What it isn't
> +-------------
> +
> +System call filtering isn't a sandbox.  It provides a clearly defined
> +mechanism for minimizing the exposed kernel surface.  Beyond that,
> +policy for logical behavior and information flow should be managed with
> +a combinations of other system hardening techniques and, potentially, a
> +LSM of your choosing.  Expressive, dynamic filters provide further options down
> +this path (avoiding pathological sizes or selecting which of the multiplexed
> +system calls in socketcall() is allowed, for instance) which could be
> +construed, incorrectly, as a more complete sandboxing solution.
> +
> +Usage
> +-----
> +
> +An additional seccomp mode is added, but they are not directly set by the
> +consuming process.  The new mode, '2', is only available if
> +CONFIG_SECCOMP_FILTER is set and enabled using prctl with the
> +PR_ATTACH_SECCOMP_FILTER argument.
> +
> +Interacting with seccomp filters is done using one prctl(2) call.
> +
> +PR_ATTACH_SECCOMP_FILTER:
> +       Allows the specification of a new filter using a BPF program.
> +       The BPF program will be executed over a user_regs_struct data
> +       reflecting system call time except with the system call number
> +       resident in orig_[register].  To allow a system call, the size
> +       of the data must be returned.  At present, all other return values
> +       result in the system call being blocked, but it is recommended to
> +       return 0 in those cases.  This will allow for future custom return
> +       values to be introduced, if ever desired.
> +
> +       Usage:
> +               prctl(PR_ATTACH_SECCOMP_FILTER, prog);
> +
> +       The 'prog' argument is a pointer to a struct sock_fprog which will
> +       contain the filter program.  If the program is invalid, the call
> +       will return -1 and set errno to -EINVAL.
> +
> +       The struct user_regs_struct the @prog will see is based on the
> +       personality of the task at the time of this prctl call.  Additionally,
> +       is_compat_task is also tracked for the @prog.  This means that once set
> +       the calling task will have all of its system calls blocked if it
> +       switches its system call ABI (via personality or other means).
> +
> +       If the @prog is installed while the task has CAP_SYS_ADMIN in its user
> +       namespace, the @prog will be marked as inheritable across execve.  Any
> +       inherited filters are still subject to the system call ABI constraints
> +       above and any ABI mismatched system calls will result in process death.
> +
> +       Additionally, if prctl(2) is allowed by the attached filter,
> +       additional filters may be layered on which will increase evaluation
> +       time, but allow for further decreasing the attack surface during
> +       execution of a process.
> +
> +The above call returns 0 on success and non-zero on error.
> +
> +Example
> +-------
> +
> +samples/seccomp-bpf-example.c shows an example process that allows read from stdin,
> +write to stdout/err, exit and signal returns for 32-bit x86.
> +
> +Caveats
> +-------
> +
> +- execve will fail unless the most recently attached filter was installed by
> +  a process with CAP_SYS_ADMIN (in its namespace).
> +
> +Adding architecture support
> +-----------------------
> +
> +Any platform with seccomp support will support seccomp filters
> +as long as CONFIG_SECCOMP_FILTER is enabled.
> diff --git a/samples/Makefile b/samples/Makefile
> index 6280817..f29b19c 100644
> --- a/samples/Makefile
> +++ b/samples/Makefile
> @@ -1,4 +1,4 @@
>  # Makefile for Linux samples code
>
>  obj-$(CONFIG_SAMPLES)  += kobject/ kprobes/ tracepoints/ trace_events/ \
> -                          hw_breakpoint/ kfifo/ kdb/ hidraw/
> +                          hw_breakpoint/ kfifo/ kdb/ hidraw/ seccomp/
> diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
> new file mode 100644
> index 0000000..80dc8e4
> --- /dev/null
> +++ b/samples/seccomp/Makefile
> @@ -0,0 +1,12 @@
> +# kbuild trick to avoid linker error. Can be omitted if a module is built.
> +obj- := dummy.o
> +
> +# List of programs to build
> +hostprogs-$(CONFIG_X86_32) := bpf-example
> +bpf-example-objs := bpf-example.o
> +
> +# Tell kbuild to always build the programs
> +always := $(hostprogs-y)
> +
> +HOSTCFLAGS_bpf-example.o += -I$(objtree)/usr/include -m32
> +HOSTLOADLIBES_bpf-example += -m32
> diff --git a/samples/seccomp/bpf-example.c b/samples/seccomp/bpf-example.c
> new file mode 100644
> index 0000000..f98b70a
> --- /dev/null
> +++ b/samples/seccomp/bpf-example.c
> @@ -0,0 +1,74 @@
> +/*
> + * Seccomp BPF example
> + *
> + * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@xxxxxxxxxxxx>
> + * Author: Will Drewry <wad@xxxxxxxxxxxx>
> + *
> + * The code may be used by anyone for any purpose,
> + * and can serve as a starting point for developing
> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
> + */
> +
> +#include <asm/unistd.h>
> +#include <linux/filter.h>
> +#include <stdio.h>
> +#include <stddef.h>
> +#include <sys/prctl.h>
> +#include <sys/user.h>
> +#include <unistd.h>
> +
> +#ifndef PR_ATTACH_SECCOMP_FILTER
> +#      define PR_ATTACH_SECCOMP_FILTER 36
> +#endif
> +
> +#define regoffset(_reg) (offsetof(struct user_regs_struct, _reg))
> +static int install_filter(void)
> +{
> +       struct sock_filter filter[] = {
> +               /* Grab the system call number */
> +               BPF_STMT(BPF_LD+BPF_W+BPF_IND, regoffset(orig_eax)),
> +               /* Jump table for the allowed syscalls */
> +               BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 10, 0),
> +               BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 9, 0),
> +               BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 8, 0),
> +               BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 7, 0),
> +               BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0),
> +               BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 2, 6),
> +
> +               /* Check that read is only using stdin. */
> +               BPF_STMT(BPF_LD+BPF_W+BPF_IND, regoffset(ebx)),
> +               BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 3, 4),
> +
> +               /* Check that write is only using stdout/stderr */
> +               BPF_STMT(BPF_LD+BPF_W+BPF_IND, regoffset(ebx)),
> +               BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0),
> +               BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 0, 1),
> +
> +               /* Put the "accept" value in A */
> +               BPF_STMT(BPF_LD+BPF_W+BPF_LEN, 0),
> +
> +               BPF_STMT(BPF_RET+BPF_A,0),
> +       };
> +       struct sock_fprog prog = {
> +               .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
> +               .filter = filter,
> +       };
> +       if (prctl(PR_ATTACH_SECCOMP_FILTER, &prog)) {
> +               perror("prctl");
> +               return 1;
> +       }
> +       return 0;
> +}
> +
> +#define payload(_c) _c, sizeof(_c)
> +int main(int argc, char **argv) {
> +       char buf[4096];
> +       ssize_t bytes = 0;
> +       if (install_filter())
> +               return 1;
> +       syscall(__NR_write, STDOUT_FILENO, payload("OHAI! WHAT IS YOUR NAME? "));
> +       bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf));
> +       syscall(__NR_write, STDOUT_FILENO, payload("HELLO, "));
> +       syscall(__NR_write, STDOUT_FILENO, buf, bytes);
> +       return 0;
> +}
> --
> 1.7.5.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux