On Tue, Jun 02, 2020 at 11:34:04AM +0000, zhujianwei (C) wrote: > And in many scenarios, the requirement for syscall filter is usually > simple, and does not need complex filter rules, for example, just > configure a syscall black or white list. However, we have noticed that > seccomp will have a performance overhead that cannot be ignored in this > simple scenario. For example, referring to Kees's t est data, this cost > is almost 41/636 = 6.5%, and Alex's data is 17/226 = 7.5%, based on > single rule of filtering (getpid); Our data for this overhead is 19.8% > (refer to the previous 'orignal' test results), filtering based on our > 20 rules (unixbench syscall). I wonder if aarch64 has higher overhead for calling into the TIF_WORK trace stuff? (Or if aarch64's BPF JIT is not as efficient as x86?) > // kernel modification > --- linux-5.7-rc7_1/arch/arm64/kernel/ptrace.c 2020-05-25 06:32:54.000000000 +0800 > +++ linux-5.7-rc7/arch/arm64/kernel/ptrace.c 2020-06-02 12:35:04.412000000 +0800 > @@ -1827,6 +1827,46 @@ > regs->regs[regno] = saved_reg; > } > > +#define PID_MAX 1000000 > +#define SYSNUM_MAX 0x220 You can use NR_syscalls here, I think. > + > +/* all zero*/ > +bool g_light_filter_switch[PID_MAX] = {0}; > +bool g_light_filter_bitmap[PID_MAX][SYSNUM_MAX] = {0}; These can be static, and I would double-check your allocation size -- I suspect this is allocating a byte for each bool. I would recommend DECLARE_BITMAP() and friends. > +static int __light_syscall_filter(void) { > + int pid; > + int this_syscall; > + > + pid = current->pid; > + this_syscall = syscall_get_nr(current, task_pt_regs(current)); > + > + if(g_light_filter_bitmap[pid][this_syscall] == true) { > + printk(KERN_ERR "light syscall filter: syscall num %d denied.\n", this_syscall); > + goto skip; > + } > + > + return 0; > +skip: > + return -1; > +} > + > +static inline int light_syscall_filter(void) { > + if (unlikely(test_thread_flag(TIF_SECCOMP))) { > + return __light_syscall_filter(); > + } > + > + return 0; > +} > + > int syscall_trace_enter(struct pt_regs *regs) > { > unsigned long flags = READ_ONCE(current_thread_info()->flags); > @@ -1837,9 +1877,10 @@ > return -1; > } > > - /* Do the secure computing after ptrace; failures should be fast. */ > - if (secure_computing() == -1) > + /* light check for syscall-num-only rule. */ > + if (light_syscall_filter() == -1) { > return -1; > + } > > if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) > trace_sys_enter(regs, regs->syscallno); Given that you're still doing this in syscall_trace_enter(), I imagine it could live in secure_computing(). Anyway, the functionality here is similar to what I've been working on for bitmaps (having a global preallocated bitmap isn't going to be upstreamable, but it's good for PoC). The complications are with handling differing architecture (for compat systems), tracking/choosing between the various basic SECCOMP_RET_* behaviors, etc. -Kees -- Kees Cook