Hello everyone, Mike noticed the userfaultfd selftest on 32-way SMP bare metal triggers a SIGBUS and I could reproduce with KVM -smp 32 Fedora guests. This never happened with fewer CPUs. I grabbed the stack trace of the false wakeup. It looks like the rwsem code can wake the task unexpectedly. We knew the scheduler didn't like the idea of providing perfect wakeups, and the current way of relying on the scheduler wakeups to be perfect has already been questioned, but this seems a somewhat wider issue that affects part outside of the scheduler too. I was undecided at the idea the scheduler could ignore a __set_current_state(TASK_INTERRUPTIBLE); schedule(); and in fact it never did, but the rwsem wakeup is external to the scheduler so we need to fix this regardless. The SMP race condition is so rare it doesn't even happen on fewer than 32 CPUs, and most certainly it won't happen on anything but the selftest itself. If it happens the side effect is just the SIGBUS killing the task gracefully, there's zero risk of memory corruption, zero fs corruption, no memleaks and there are no security issues associated with this race condition either. So it's not major concern but it must still be fixed ASAP. For those usages where userfaultfd is used for vma-less enforcement of no access over memory holes, it would even a lesser concern. This is very lightly tested so far, comments welcome. Note: this is fully orthogonal issue with the new userfaultfd features in -mm, so I'm posting this against upstream. Note2: my userfault aa.git branch has the one-more-time VM_FAULT_RETRY feature for the WP support which hides the problem 100% successfully but it's just doing it by luck. That's needed for another reason and it'll be still needed on top of this. The chances of hitting the race twice in a row is zero, this is why it gets hidden, but in theory this could happen twice in a row and the fix must handle twice in a row too. So this patch has to be tested against current upstream only or you can't reproduce any problem in the first place. Once we converge on a reviewed solution I'll forward port it to -mm/userfault branch (I doubt it applies clean). Andrea Arcangeli (1): userfaultfd: fix SIGBUS resulting from false rwsem wakeups fs/userfaultfd.c | 37 +++++++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>