On Fri, Oct 23, 2020 at 7:48 PM Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: > > Hello everyone, > > On Sat, Oct 10, 2020 at 11:24:56PM -0700, Lokesh Gidra wrote: > > With this change, when the knob is set to 0, it allows unprivileged > > users to call userfaultfd, like when it is set to 1, but with the > > restriction that page faults from only user-mode can be handled. > > In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) > > must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with > > EPERM. > > > > This enables administrators to reduce the likelihood that > > an attacker with access to userfaultfd can delay faulting kernel > > code to widen timing windows for other exploits. > > > > The default value of this knob is changed to 0. This is required for > > correct functioning of pipe mutex. However, this will fail postcopy > > live migration, which will be unnoticeable to the VM guests. To avoid > > this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details, > > refer to Andrea's reply [1]. > > > > [1] https://lore.kernel.org/lkml/20200904033438.GI9411@xxxxxxxxxx/ > > > > Signed-off-by: Lokesh Gidra <lokeshgidra@xxxxxxxxxx> > > Nobody commented so it seems everyone is on board with this change to > synchronize the kernel default with the post-boot Android default. > > The email in the link above was pretty long, so the below would be a > summary that could be added to the commit header: > > == > > The main reason this change is desirable as in the short term is that > the Android userland will behave as with the sysctl set to zero. So > without this commit, any Linux binary using userfaultfd to manage its > memory would behave differently if run within the Android userland. > > == Sure. I'll add it in the next revision. > > Reviewed-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Thanks so much for the review. I hope it's ok to add your 'reviewed-by' in the next revision? > > BTW, this is still a minor nitpick, but a printk_once of the 1/2 could > be added before the return -EPERM too, that's actually what I meant > when I suggested to add a printk_once :), however the printk_once you > added can turn out to be useful too for devs converting code to use > bounce buffers, so it's fine too, just it could go under DEBUG_VM and > to be ratelimited (similarly to the "FAULT_FLAG_ALLOW_RETRY missing > %x\n" printk). I'll move the printk_once from 1/2 to this patch, as you suggested. > > Thanks, > Andrea >