On Wed, May 20, 2020 at 12:59:38AM -0400, Andrea Arcangeli wrote: > Hello everyone, > > On Fri, May 08, 2020 at 12:54:03PM -0400, Michael S. Tsirkin wrote: > > On Fri, May 08, 2020 at 12:52:34PM -0400, Michael S. Tsirkin wrote: > > > On Wed, Apr 22, 2020 at 05:26:32PM -0700, Daniel Colascione wrote: > > > > This sysctl can be set to either zero or one. When zero (the default) > > > > the system lets all users call userfaultfd with or without > > > > UFFD_USER_MODE_ONLY, modulo other access controls. When > > > > unprivileged_userfaultfd_user_mode_only is set to one, users without > > > > CAP_SYS_PTRACE must pass UFFD_USER_MODE_ONLY to userfaultfd or the API > > > > will fail with EPERM. This facility allows administrators to reduce > > > > the likelihood that an attacker with access to userfaultfd can delay > > > > faulting kernel code to widen timing windows for other exploits. > > > > > > > > Signed-off-by: Daniel Colascione <dancol@xxxxxxxxxx> > > > > > > The approach taken looks like a hard-coded security policy. > > > For example, it won't be possible to set the sysctl knob > > > in question on any sytem running kvm. So this is > > > no good for any general purpose system. Not all systems run unprivileged KVM. :) > > > What's wrong with using a security policy for this instead? > > > > In fact I see the original thread already mentions selinux, > > so it's just a question of making this controllable by > > selinux. > > I agree it'd be preferable if it was not hardcoded, but then this > patchset is also much simpler than the previous controlling it through > selinux.. > > I was thinking, an alternative policy that could control it without > hard-coding it, is a seccomp-bpf filter, then you can drop 2/2 as > well, not just 1/6-4/6. Err, did I miss a separate 6-patch series? I can't find anything on lore. > > If you keep only 1/2, can't seccomp-bpf enforce userfaultfd to be > always called with flags==0x1 without requiring extra modifications in > the kernel? Please no. This is way too much overhead for something that a system owner wants to enforce globally. A sysctl is the correct option here, IMO. If it needs to be a per-userns sysctl, that would be fine too. > Can't you get the feature party with the CAP_SYS_PTRACE capability > too, if you don't wrap those tasks with the ptrace capability under > that seccomp filter? > > As far as I can tell, it's unprecedented to create a flag for a > syscall API, with the only purpose of implementing a seccomp-bpf > filter verifying such flag is set, but then if you want to control it > with LSM it's even more complex than doing it with seccomp-bpf, and it > requires more kernel code too. We could always add 2/2 later, such > possibility won't disappear, in fact we could also add 1/6-4/6 later > too if that is not enough. > > If we could begin by merging only 1/2 from this new series and be done > with the kernel changes, because we offload the rest of the work to > the kernel eBPF JIT, I think it'd be ideal. I'd agree that patch 1 should land, as it appears to be required for any further policy considerations. I'm still a big fan of a sysctl since this is the kind of thing I would absolutely turn on globally for all my systems. -- Kees Cook