On Wed, Sep 30, 2020 at 5:20 PM YiFei Zhu <zhuyifei1999@xxxxxxxxx> wrote: > SECCOMP_CACHE_NR_ONLY will only operate on syscalls that do not > access any syscall arguments or instruction pointer. To facilitate > this we need a static analyser to know whether a filter will > return allow regardless of syscall arguments for a given > architecture number / syscall number pair. This is implemented > here with a pseudo-emulator, and stored in a per-filter bitmap. > > Each common BPF instruction are emulated. Any weirdness or loading > from a syscall argument will cause the emulator to bail. > > The emulation is also halted if it reaches a return. In that case, > if it returns an SECCOMP_RET_ALLOW, the syscall is marked as good. > > Emulator structure and comments are from Kees [1] and Jann [2]. > > Emulation is done at attach time. If a filter depends on more > filters, and if the dependee does not guarantee to allow the > syscall, then we skip the emulation of this syscall. > > [1] https://lore.kernel.org/lkml/20200923232923.3142503-5-keescook@xxxxxxxxxxxx/ > [2] https://lore.kernel.org/lkml/CAG48ez1p=dR_2ikKq=xVxkoGg0fYpTBpkhJSv1w-6BG=76PAvw@xxxxxxxxxxxxxx/ [...] > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 1ab22869a765..ff5289228ea5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -150,6 +150,7 @@ config X86 > select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT > select HAVE_ARCH_PREL32_RELOCATIONS > select HAVE_ARCH_SECCOMP_FILTER > + select HAVE_ARCH_SECCOMP_CACHE_NR_ONLY > select HAVE_ARCH_THREAD_STRUCT_WHITELIST > select HAVE_ARCH_STACKLEAK > select HAVE_ARCH_TRACEHOOK If you did the architecture enablement for X86 later in the series, you could move this part over into that patch, that'd be cleaner. > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > index ae6b40cc39f4..f09c9e74ae05 100644 > --- a/kernel/seccomp.c > +++ b/kernel/seccomp.c > @@ -143,6 +143,37 @@ struct notification { > struct list_head notifications; > }; > > +#ifdef CONFIG_SECCOMP_CACHE_NR_ONLY > +/** > + * struct seccomp_cache_filter_data - container for cache's per-filter data > + * > + * Tis struct is ordered to minimize padding holes. I think this comment can probably go away, there isn't really much trickery around padding holes in the struct as it is now. > + * @syscall_allow_default: A bitmap where each bit represents whether the > + * filter willalways allow the syscall, for the nit: s/willalways/will always/ [...] > +static void seccomp_cache_prepare_bitmap(struct seccomp_filter *sfilter, > + void *bitmap, const void *bitmap_prev, > + size_t bitmap_size, int arch) > +{ > + struct sock_fprog_kern *fprog = sfilter->prog->orig_prog; > + struct seccomp_data sd; > + int nr; > + > + for (nr = 0; nr < bitmap_size; nr++) { > + if (bitmap_prev && !test_bit(nr, bitmap_prev)) > + continue; > + > + sd.nr = nr; > + sd.arch = arch; > + > + if (seccomp_emu_is_const_allow(fprog, &sd)) > + set_bit(nr, bitmap); set_bit() is atomic, but since we only do this at filter setup, before the filter becomes globally visible, we don't need atomicity here. So this should probably use __set_bit() instead.