On Mon, Jun 01, 2020 at 12:11:37PM +0200, Lennart Poettering wrote: > On Fr, 29.05.20 12:27, Kees Cook (keescook@xxxxxxxxxxxx) wrote: > > > # grep ^Seccomp_filters /proc/$(pidof systemd-resolved)/status > > Seccomp_filters: 32 > > > > # grep SystemCall /lib/systemd/system/systemd-resolved.service > > SystemCallArchitectures=native > > SystemCallErrorNumber=EPERM > > SystemCallFilter=@system-service > > > > I'd like to better understand what they're doing, but haven't had time > > to dig in. (The systemd devel mailing list requires subscription, so > > I've directly CCed some systemd folks that have touched seccomp there > > recently. Hi! The starts of this thread is here[4].) > > Hmm, so on x86-64 we try to install our seccomp filters three times: > for the x86-64 syscall ABI, for the i386 syscall ABI and for the x32 > syscall ABI. Not all of the filters we apply work on all ABIs though, > because syscalls are available on some but not others, or cannot > sensibly be matched on some (because of socketcall, ipc and such > multiplexed syscalls). > > [...] Thanks for the details on this! That helps me understand what's happening much better. :) > An easy improvement is probably if libseccomp would now start refusing > to install x32 seccomp filters altogether now that x32 is entirely > dead? Or are the entrypoints for x32 syscalls still available in the > kernel? How could userspace figure out if they are available? If > libseccomp doesn't want to add code for that, we probably could have > that in systemd itself too... Would it make sense to provide a systemd setting for services to declare "no compat" or "no x32" (I'm not sure what to call this mode more generically, "no 32-bit allocation ABI"?) Then you can just install a single merged filter for all the native syscalls that starts with "if not native, reject"? (Or better yet: make the default for filtering be "native only", and let services opt into other ABIs?) -- Kees Cook