On Mon, Jan 30, 2012 at 4:47 PM, Corey Bryant <coreyb@xxxxxxxxxxxxxxxxxx> wrote: > > > On 01/28/2012 05:11 PM, Will Drewry wrote: >> >> Documents how system call filtering using Berkeley Packet >> Filter programs works and how it may be used. >> Includes an example for x86 (32-bit) and a semi-generic >> example using an example code generator. >> >> v6: - tweak the language to note the requirement of >> PR_SET_NO_NEW_PRIVS being called prior to use. (luto@xxxxxxx) >> v5: - update sample to use system call arguments >> - adds a "fancy" example using a macro-based generator >> - cleaned up bpf in the sample >> - update docs to mention arguments >> - fix prctl value (eparis@xxxxxxxxxx) >> - language cleanup (rdunlap@xxxxxxxxxxxx) >> v4: - update for no_new_privs use >> - minor tweaks >> v3: - call out BPF<-> Berkeley Packet Filter (rdunlap@xxxxxxxxxxxx) >> - document use of tentative always-unprivileged >> - guard sample compilation for i386 and x86_64 >> v2: - move code to samples (corbet@xxxxxxx) >> >> Signed-off-by: Will Drewry<wad@xxxxxxxxxxxx> >> --- >> Documentation/prctl/seccomp_filter.txt | 100 +++++++++++++++ >> samples/Makefile | 2 +- >> samples/seccomp/Makefile | 27 ++++ >> samples/seccomp/bpf-direct.c | 77 +++++++++++ >> samples/seccomp/bpf-fancy.c | 95 ++++++++++++++ >> samples/seccomp/bpf-helper.c | 89 +++++++++++++ >> samples/seccomp/bpf-helper.h | 219 >> ++++++++++++++++++++++++++++++++ >> 7 files changed, 608 insertions(+), 1 deletions(-) >> create mode 100644 Documentation/prctl/seccomp_filter.txt >> create mode 100644 samples/seccomp/Makefile >> create mode 100644 samples/seccomp/bpf-direct.c >> create mode 100644 samples/seccomp/bpf-fancy.c >> create mode 100644 samples/seccomp/bpf-helper.c >> create mode 100644 samples/seccomp/bpf-helper.h >> >> diff --git a/Documentation/prctl/seccomp_filter.txt >> b/Documentation/prctl/seccomp_filter.txt >> new file mode 100644 >> index 0000000..4ad7649 >> --- /dev/null >> +++ b/Documentation/prctl/seccomp_filter.txt >> @@ -0,0 +1,100 @@ >> + Seccomp filtering >> + ================= >> + >> +Introduction >> +------------ >> + >> +A large number of system calls are exposed to every userland process >> +with many of them going unused for the entire lifetime of the process. >> +As system calls change and mature, bugs are found and eradicated. A >> +certain subset of userland applications benefit by having a reduced set >> +of available system calls. The resulting set reduces the total kernel >> +surface exposed to the application. System call filtering is meant for >> +use with those applications. >> + >> +Seccomp filtering provides a means for a process to specify a filter for >> +incoming system calls. The filter is expressed as a Berkeley Packet >> +Filter (BPF) program, as with socket filters, except that the data >> +operated on is related to the system call being made: system call >> +number, and the system call arguments. This allows for expressive >> +filtering of system calls using a filter program language with a long >> +history of being exposed to userland and a straightforward data set. >> + >> +Additionally, BPF makes it impossible for users of seccomp to fall prey >> +to time-of-check-time-of-use (TOCTOU) attacks that are common in system >> +call interposition frameworks. BPF programs may not dereference >> +pointers which constrains all filters to solely evaluating the system >> +call arguments directly. >> + >> +What it isn't >> +------------- >> + >> +System call filtering isn't a sandbox. It provides a clearly defined >> +mechanism for minimizing the exposed kernel surface. Beyond that, >> +policy for logical behavior and information flow should be managed with >> +a combination of other system hardening techniques and, potentially, an >> +LSM of your choosing. Expressive, dynamic filters provide further >> options down >> +this path (avoiding pathological sizes or selecting which of the >> multiplexed >> +system calls in socketcall() is allowed, for instance) which could be >> +construed, incorrectly, as a more complete sandboxing solution. >> + >> +Usage >> +----- >> + >> +An additional seccomp mode is added, but they are not directly set by >> +the consuming process. The new mode, '2', is only available if >> +CONFIG_SECCOMP_FILTER is set and enabled using prctl with the >> +PR_ATTACH_SECCOMP_FILTER argument. >> + >> +Interacting with seccomp filters is done using one prctl(2) call. >> + >> +PR_ATTACH_SECCOMP_FILTER: >> + Allows the specification of a new filter using a BPF program. >> + The BPF program will be executed over struct seccomp_filter_data >> + reflecting the system call number, arguments, and other >> + metadata, To allow a system call, SECCOMP_BPF_ALLOW must be >> + returned. At present, all other return values result in the >> + system call being blocked, but it is recommended to return >> + SECCOMP_BPF_DENY in those cases. This will allow for future >> + custom return values to be introduced, if ever desired. >> + >> + Usage: >> + prctl(PR_ATTACH_SECCOMP_FILTER, prog); >> + >> + The 'prog' argument is a pointer to a struct sock_fprog which will >> + contain the filter program. If the program is invalid, the call >> + will return -1 and set errno to EINVAL. >> + >> + Note, is_compat_task is also tracked for the @prog. This means >> + that once set the calling task will have all of its system calls >> + blocked if it switches its system call ABI. >> + >> + If fork/clone and execve are allowed by @prog, any child processes >> will >> + be constrained to the same filters and system call ABI as the >> parent. >> + >> + Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or >> + run with CAP_SYS_ADMIN privileges in its namespace. If these are >> not >> + true, -EACCES will be returned. This requirement ensures that >> filter >> + programs cannot be applied to child processes with greater >> privileges >> + than the task that installed them. >> + >> + Additionally, if prctl(2) is allowed by the attached filter, >> + additional filters may be layered on which will increase >> evaluation >> + time, but allow for further decreasing the attack surface during >> + execution of a process. >> + >> +The above call returns 0 on success and non-zero on error. >> + >> +Example >> +------- >> + >> +The samples/seccomp/ directory contains both a 32-bit specific example >> +and a more generic example of a higher level macro interface for BPF >> +program generation. >> + >> +Adding architecture support >> +----------------------- >> + >> +Any platform with seccomp support will support seccomp filters as long >> +as CONFIG_SECCOMP_FILTER is enabled and the architecture has implemented >> +syscall_get_arguments. >> diff --git a/samples/Makefile b/samples/Makefile >> index 6280817..f29b19c 100644 >> --- a/samples/Makefile >> +++ b/samples/Makefile >> @@ -1,4 +1,4 @@ >> # Makefile for Linux samples code >> >> obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ tracepoints/ trace_events/ \ >> - hw_breakpoint/ kfifo/ kdb/ hidraw/ >> + hw_breakpoint/ kfifo/ kdb/ hidraw/ seccomp/ >> diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile >> new file mode 100644 >> index 0000000..0298c6f >> --- /dev/null >> +++ b/samples/seccomp/Makefile >> @@ -0,0 +1,27 @@ >> +# kbuild trick to avoid linker error. Can be omitted if a module is >> built. >> +obj- := dummy.o >> + >> +hostprogs-y := bpf-fancy >> +bpf-fancy-objs := bpf-fancy.o bpf-helper.o >> + >> +HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include >> +HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include >> +HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include >> +HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include >> + >> +# bpf-direct.c is x86-only. >> +ifeq ($(filter-out x86_64 i386,$(KBUILD_BUILDHOST)),) >> +# List of programs to build >> +hostprogs-y += bpf-direct >> +bpf-direct-objs := bpf-direct.o >> +endif >> + >> +# Tell kbuild to always build the programs >> +always := $(hostprogs-y) >> + >> +HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include >> +HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include >> +ifeq ($(KBUILD_BUILDHOST),x86_64) >> +HOSTCFLAGS_bpf-direct.o += -m32 >> +HOSTLOADLIBES_bpf-direct += -m32 >> +endif >> diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c >> new file mode 100644 >> index 0000000..d799244 >> --- /dev/null >> +++ b/samples/seccomp/bpf-direct.c >> @@ -0,0 +1,77 @@ >> +/* >> + * 32-bit seccomp filter example with BPF macros >> + * >> + * Copyright (c) 2012 The Chromium OS >> Authors<chromium-os-dev@xxxxxxxxxxxx> >> + * Author: Will Drewry<wad@xxxxxxxxxxxx> >> + * >> + * The code may be used by anyone for any purpose, >> + * and can serve as a starting point for developing >> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). >> + */ >> + >> +#include<linux/filter.h> >> +#include<linux/ptrace.h> >> +#include<linux/seccomp_filter.h> >> +#include<linux/unistd.h> >> +#include<stdio.h> >> +#include<stddef.h> >> +#include<sys/prctl.h> >> +#include<unistd.h> >> + >> +#ifndef PR_ATTACH_SECCOMP_FILTER >> +# define PR_ATTACH_SECCOMP_FILTER 37 >> +#endif >> + >> +#define syscall_arg(_n) (offsetof(struct seccomp_filter_data, >> args[_n].lo32)) >> +#define nr (offsetof(struct seccomp_filter_data, syscall_nr)) >> + >> +static int install_filter(void) >> +{ >> + struct seccomp_filter_block filter[] = { >> + /* Grab the system call number */ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, nr), >> + /* Jump table for the allowed syscalls */ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 10, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 9, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 8, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 7, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 2, 6), >> + >> + /* Check that read is only using stdin. */ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 3, 4), >> + >> + /* Check that write is only using stdout/stderr */ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0), >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 0, 1), >> + >> + BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_ALLOW), >> + BPF_STMT(BPF_RET+BPF_K, SECCOMP_BPF_DENY), >> + }; >> + struct seccomp_fprog prog = { >> + .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), >> + .filter = filter, >> + }; >> + if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) { >> + perror("prctl"); >> + return 1; >> + } >> + return 0; >> +} >> + >> +#define payload(_c) (_c), sizeof((_c)) >> +int main(int argc, char **argv) >> +{ >> + char buf[4096]; >> + ssize_t bytes = 0; >> + if (install_filter()) >> + return 1; >> + syscall(__NR_write, STDOUT_FILENO, >> + payload("OHAI! WHAT IS YOUR NAME? ")); >> + bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)); >> + syscall(__NR_write, STDOUT_FILENO, payload("HELLO, ")); >> + syscall(__NR_write, STDOUT_FILENO, buf, bytes); >> + return 0; >> +} >> diff --git a/samples/seccomp/bpf-fancy.c b/samples/seccomp/bpf-fancy.c >> new file mode 100644 >> index 0000000..1318b1a >> --- /dev/null >> +++ b/samples/seccomp/bpf-fancy.c >> @@ -0,0 +1,95 @@ >> +/* >> + * Seccomp BPF example using a macro-based generator. >> + * >> + * Copyright (c) 2012 The Chromium OS >> Authors<chromium-os-dev@xxxxxxxxxxxx> >> + * Author: Will Drewry<wad@xxxxxxxxxxxx> >> + * >> + * The code may be used by anyone for any purpose, >> + * and can serve as a starting point for developing >> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). >> + */ >> + >> +#include<linux/seccomp_filter.h> >> +#include<linux/unistd.h> >> +#include<stdio.h> >> +#include<string.h> >> +#include<sys/prctl.h> >> +#include<unistd.h> >> + >> +#include "bpf-helper.h" >> + >> +#ifndef PR_ATTACH_SECCOMP_FILTER >> +# define PR_ATTACH_SECCOMP_FILTER 37 >> +#endif >> + >> +int main(int argc, char **argv) >> +{ >> + struct bpf_labels l; >> + static const char msg1[] = "Please type something: "; >> + static const char msg2[] = "You typed: "; >> + char buf[256]; >> + struct seccomp_filter_block filter[] = { >> + LOAD_SYSCALL_NR, >> + SYSCALL(__NR_exit, ALLOW), >> + SYSCALL(__NR_exit_group, ALLOW), >> + SYSCALL(__NR_write, JUMP(&l, write_fd)), >> + SYSCALL(__NR_read, JUMP(&l, read)), >> + DENY, /* Don't passthrough into a label */ >> + >> + LABEL(&l, read), >> + ARG(0), >> + JNE(STDIN_FILENO, DENY), >> + ARG(1), >> + JNE((unsigned long)buf, DENY), >> + ARG(2), >> + JGE(sizeof(buf), DENY), >> + ALLOW, >> + >> + LABEL(&l, write_fd), >> + ARG(0), >> + JEQ(STDOUT_FILENO, JUMP(&l, write_buf)), >> + JEQ(STDERR_FILENO, JUMP(&l, write_buf)), >> + DENY, >> + >> + LABEL(&l, write_buf), >> + ARG(1), >> + JEQ((unsigned long)msg1, JUMP(&l, msg1_len)), >> + JEQ((unsigned long)msg2, JUMP(&l, msg2_len)), >> + JEQ((unsigned long)buf, JUMP(&l, buf_len)), >> + DENY, >> + >> + LABEL(&l, msg1_len), >> + ARG(2), >> + JLT(sizeof(msg1), ALLOW), >> + DENY, >> + >> + LABEL(&l, msg2_len), >> + ARG(2), >> + JLT(sizeof(msg2), ALLOW), >> + DENY, >> + >> + LABEL(&l, buf_len), >> + ARG(2), >> + JLT(sizeof(buf), ALLOW), >> + DENY, >> + }; >> + struct seccomp_fprog prog = { >> + .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])), >> + .filter = filter, >> + }; >> + ssize_t bytes; >> + bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter)); >> + >> + if (prctl(PR_ATTACH_SECCOMP_FILTER,&prog)) { >> + perror("prctl"); >> + return 1; >> + } >> + syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1)); >> + bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1); >> + bytes = (bytes> 0 ? bytes : 0); >> + syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)); >> + syscall(__NR_write, STDERR_FILENO, buf, bytes); >> + /* Now get killed */ >> + syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2); >> + return 0; >> +} >> diff --git a/samples/seccomp/bpf-helper.c b/samples/seccomp/bpf-helper.c >> new file mode 100644 >> index 0000000..e1b6bc7 >> --- /dev/null >> +++ b/samples/seccomp/bpf-helper.c >> @@ -0,0 +1,89 @@ >> +/* >> + * Seccomp BPF helper functions >> + * >> + * Copyright (c) 2012 The Chromium OS >> Authors<chromium-os-dev@xxxxxxxxxxxx> >> + * Author: Will Drewry<wad@xxxxxxxxxxxx> >> + * >> + * The code may be used by anyone for any purpose, >> + * and can serve as a starting point for developing >> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). >> + */ >> + >> +#include<stdio.h> >> +#include<string.h> >> + >> +#include "bpf-helper.h" >> + >> +int bpf_resolve_jumps(struct bpf_labels *labels, >> + struct seccomp_filter_block *filter, size_t count) >> +{ >> + struct seccomp_filter_block *begin = filter; >> + __u8 insn = count - 1; >> + >> + if (count< 1) >> + return -1; >> + /* >> + * Walk it once, backwards, to build the label table and do fixups. >> + * Since backward jumps are disallowed by BPF, this is easy. >> + */ >> + filter += insn; >> + for (; filter>= begin; --insn, --filter) { >> + if (filter->code != (BPF_JMP+BPF_JA)) >> + continue; >> + switch ((filter->jt<<8)|filter->jf) { >> + case (JUMP_JT<<8)|JUMP_JF: >> + if (labels->labels[filter->k].location == >> 0xffffffff) { >> + fprintf(stderr, "Unresolved label: >> '%s'\n", >> + labels->labels[filter->k].label); >> + return 1; >> + } >> + filter->k = labels->labels[filter->k].location - >> + (insn + 1); >> + filter->jt = 0; >> + filter->jf = 0; >> + continue; >> + case (LABEL_JT<<8)|LABEL_JF: >> + if (labels->labels[filter->k].location != >> 0xffffffff) { >> + fprintf(stderr, "Duplicate label use: >> '%s'\n", >> + labels->labels[filter->k].label); >> + return 1; >> + } >> + labels->labels[filter->k].location = insn; >> + filter->k = 0; /* fall through */ >> + filter->jt = 0; >> + filter->jf = 0; >> + continue; >> + } >> + } >> + return 0; >> +} >> + >> +/* Simple lookup table for labels. */ >> +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label) >> +{ >> + struct __bpf_label *begin = labels->labels, *end; >> + int id; >> + if (labels->count == 0) { >> + begin->label = label; >> + begin->location = 0xffffffff; >> + labels->count++; >> + return 0; >> + } >> + end = begin + labels->count; >> + for (id = 0; begin< end; ++begin, ++id) { >> + if (!strcmp(label, begin->label)) >> + return id; >> + } >> + begin->label = label; >> + begin->location = 0xffffffff; >> + labels->count++; >> + return id; >> +} >> + >> +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t count) >> +{ >> + struct seccomp_filter_block *end = filter + count; >> + for ( ; filter< end; ++filter) >> + printf("{ code=%u,jt=%u,jf=%u,k=%u },\n", >> + filter->code, filter->jt, filter->jf, filter->k); >> +} >> diff --git a/samples/seccomp/bpf-helper.h b/samples/seccomp/bpf-helper.h >> new file mode 100644 >> index 0000000..92b94ec >> --- /dev/null >> +++ b/samples/seccomp/bpf-helper.h >> @@ -0,0 +1,219 @@ >> +/* >> + * Example wrapper around BPF macros. >> + * >> + * Copyright (c) 2012 The Chromium OS >> Authors<chromium-os-dev@xxxxxxxxxxxx> >> + * Author: Will Drewry<wad@xxxxxxxxxxxx> >> + * >> + * The code may be used by anyone for any purpose, >> + * and can serve as a starting point for developing >> + * applications using prctl(PR_ATTACH_SECCOMP_FILTER). >> + * >> + * No guarantees are provided with respect to the correctness >> + * or functionality of this code. >> + */ >> +#ifndef __BPF_HELPER_H__ >> +#define __BPF_HELPER_H__ >> + >> +#include<asm/bitsperlong.h> /* for __BITS_PER_LONG */ >> +#include<linux/filter.h> >> +#include<linux/seccomp_filter.h> /* for seccomp_filter_data.arg */ >> +#include<linux/types.h> >> +#include<linux/unistd.h> >> +#include<stddef.h> >> + >> +#define BPF_LABELS_MAX 256 >> +struct bpf_labels { >> + int count; >> + struct __bpf_label { >> + const char *label; >> + __u32 location; >> + } labels[BPF_LABELS_MAX]; >> +}; >> + >> +int bpf_resolve_jumps(struct bpf_labels *labels, >> + struct seccomp_filter_block *filter, size_t count); >> +__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label); >> +void seccomp_bpf_print(struct seccomp_filter_block *filter, size_t >> count); >> + >> +#define JUMP_JT 0xff >> +#define JUMP_JF 0xff >> +#define LABEL_JT 0xfe >> +#define LABEL_JF 0xfe >> + >> +#define ALLOW \ >> + BPF_STMT(BPF_RET+BPF_K, 0xFFFFFFFF) >> +#define DENY \ >> + BPF_STMT(BPF_RET+BPF_K, 0) >> +#define JUMP(labels, label) \ >> + BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ >> + JUMP_JT, JUMP_JF) >> +#define LABEL(labels, label) \ >> + BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \ >> + LABEL_JT, LABEL_JF) >> +#define SYSCALL(nr, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \ >> + jt >> + >> +/* Lame, but just an example */ >> +#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label) >> + >> +#define EXPAND(...) __VA_ARGS__ >> +/* Map all width-sensitive operations */ >> +#if __BITS_PER_LONG == 32 >> + >> +#define JEQ(x, jt) JEQ32(x, EXPAND(jt)) >> +#define JNE(x, jt) JNE32(x, EXPAND(jt)) >> +#define JGT(x, jt) JGT32(x, EXPAND(jt)) >> +#define JLT(x, jt) JLT32(x, EXPAND(jt)) >> +#define JGE(x, jt) JGE32(x, EXPAND(jt)) >> +#define JLE(x, jt) JLE32(x, EXPAND(jt)) >> +#define JA(x, jt) JA32(x, EXPAND(jt)) >> +#define ARG(i) ARG_32(i) >> + >> +#elif __BITS_PER_LONG == 64 >> + >> +#define JEQ(x, jt) \ >> + JEQ64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JGT(x, jt) \ >> + JGT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JGE(x, jt) \ >> + JGE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JNE(x, jt) \ >> + JNE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JLT(x, jt) \ >> + JLT64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define JLE(x, jt) \ >> + JLE64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> + >> +#define JA(x, jt) \ >> + JA64(((union seccomp_filter_arg){.u64 = (x)}).lo32, \ >> + ((union seccomp_filter_arg){.u64 = (x)}).hi32, \ >> + EXPAND(jt)) >> +#define ARG(i) ARG_64(i) >> + >> +#else >> +#error __BITS_PER_LONG value unusable. >> +#endif >> + >> +/* Loads the arg into A */ >> +#define ARG_32(idx) \ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ >> + offsetof(struct seccomp_filter_data, args[(idx)].lo32)) >> + >> +/* Loads hi into A and lo in X */ >> +#define ARG_64(idx) \ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ >> + offsetof(struct seccomp_filter_data, args[(idx)].lo32)), \ >> + BPF_STMT(BPF_ST, 0), /* lo -> M[0] */ \ >> + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \ >> + offsetof(struct seccomp_filter_data, args[(idx)].hi32)), \ >> + BPF_STMT(BPF_ST, 1) /* hi -> M[1] */ >> + >> +#define JEQ32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \ >> + jt >> + >> +#define JNE32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \ >> + jt >> + >> +/* Checks the lo, then swaps to check the hi. A=lo,X=hi */ >> +#define JEQ64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JNE64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JA32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \ >> + jt >> + >> +#define JA64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JGE32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \ >> + jt >> + >> +#define JLT32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \ >> + jt >> + >> +/* Shortcut checking if hi> arg.hi. */ >> +#define JGE64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JLT64(lo, hi, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \ >> + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \ >> + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \ >> + jt, \ >> + BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */ >> + >> +#define JGT32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \ >> + jt >> + >> +#define JLE32(value, jt) \ >> + BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \ >> + jt > > > Should the true/false offsets be reversed here? Looks that way :) > Thanks for all the work on this. We're looking forward to using it with > QEMU. Definitely - thanks! will -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html