On Sun, Oct 13, 2019 at 3:14 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > [adding more people because this is going to be an ABI break, sigh] > On Sat, Oct 12, 2019 at 5:52 PM Daniel Colascione <dancol@xxxxxxxxxx> wrote: > > On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@xxxxxxxxxx> wrote: > > > > The new secure flag makes userfaultfd use a new "secure" anonymous > > > > file object instead of the default one, letting security modules > > > > supervise userfaultfd use. > > > > > > > > Requiring that users pass a new flag lets us avoid changing the > > > > semantics for existing callers. > > > > > > Is there any good reason not to make this be the default? > > > > > > > > > The only downside I can see is that it would increase the memory usage > > > of userfaultfd(), but that doesn't seem like such a big deal. A > > > lighter-weight alternative would be to have a single inode shared by > > > all userfaultfd instances, which would require a somewhat different > > > internal anon_inode API. > > > > I'd also prefer to just make SELinux use mandatory, but there's a > > nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode > > which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a > > better way to deal with it. [...] > Now that you've pointed this mechanism out, it is utterly and > completely broken and should be removed from the kernel outright or at > least severely restricted. A .read implementation MUST NOT ACT ON THE > CALLING TASK. Ever. Just imagine the effect of passing a userfaultfd > as stdin to a setuid program. > > So I think the right solution might be to attempt to *remove* > UFFD_EVENT_FORK. Maybe the solution is to say that, unless the > creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot > use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when > UFFD_FEATURE_EVENT_FORK is allowed. And, after some suitable > deprecation period, just remove it. If it's genuinely useful, it > needs an entirely new API based on ioctl() or a syscall. Or even > recvmsg() :) FWIW, <https://codesearch.debian.net/search?q=UFFD_FEATURE_EVENT_FORK&literal=1> just shows the kernel, kernel selftests, and strace code for decoding syscall arguments. CRIU uses it though (probably for postcopy live migration / lazy migration?), I guess that code isn't in debian for some reason.