On Thu, Mar 14, 2019 at 11:58:15AM +0100, Paolo Bonzini wrote: > On 14/03/19 00:44, Andrea Arcangeli wrote: > > Then I thought we can add a tristate so an open of /dev/kvm would also > > allow the syscall to make things more user friendly because > > unprivileged containers ideally should have writable mounts done with > > nodev and no matter the privilege they shouldn't ever get an hold on > > the KVM driver (and those who do, like kubevirt, will then just work). > > I wouldn't even bother with the KVM special case. Containers can use > seccomp if they want a fine-grained policy. We can have a single boolean 0|1 and stick to a simpler sysctl and no gid and if you want to use userfaultfd you need to enable it for all users. I agree seccomp already provides more than enough granularity to do more finegrined choices. So this will be for who's paranoid and prefers to disable userfaultfd as a whole as an hardening feature like the bpf sysctl allows: it will allow to block uffd syscall without having to rebuild the kernel with CONFIG_USERFAULTFD=n in environments where seccomp cannot be easily enabled (i.e. without requiring userland changes). That's very fine with me, but then it wasn't me complaining in the first place. Kees? If the above is ok, we can implement it as a static key, not that the syscall itself is particularly performance critical but it'll be simple enough as a boolean (only the ioctl are performance critical but those are unaffected). The blog post about UAF is not particularly interesting in my view, unless both of the following points are true 1) it can be also proven that the very same two UAF bugs, cannot be exploited by other means (as far as I can tell it can be exploited by other means regardless of userfaultfd) and 2) the slab randomization was actually enabled (99% of the time in all POC all randomization features like kalsr are incidentally disabled first to facilitate publishing papers and blog posts, but those are really the features intended to reduce the reproduciblity of exploits against UAF bugs, not disabling userfaultfd which only provides a minor advantage, and unlike in PoC environments, we enable those slab randomization in production 100% of the time whenever they're available in the kernel). Thanks, Andrea