On Tue, Oct 31, 2017 at 08:37:47AM -0700, Linus Torvalds wrote: > Yes. Accessing "vma" after calling "handle_mm_fault()" is a bug. An > unfortunate issue with userfaultfd. > > The suggested fix to simply look up pkey beforehand seems sane and simple. Agreed. > > But sadly, from a quick check, it looks like arch/um/ has the same > bug, but even worse. It will do > > (a) handle_mm_fault() in a loop without re-calculating vma. Don't ask me why. > > (b) flush_tlb_page(vma, address); afterwards Yes, that flush_tlb_page is unsafe. Luckily it's only using it for vma->vm_mm so it doesn't sound major issue to fix it. > > but much more importantly, I think __get_user_pages() is broken in two ways: > > - faultin_page() does: > > ret = handle_mm_fault(vma, address, fault_flags); > ... > if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE)) > > (easily fixed the same way) > > - more annoyingly and harder to fix: the retry case in > __get_user_pages(), and the VMA saving there. > > Ho humm. > > Andrea, looking at that get_user_pages() case, I really think it's > userfaultfd that is broken. > > Could we perhaps limit userfaultfd to _only_ do the VM_FAULT_RETRY, > and simply fail for non-retry faults? In the get_user_pages case we already limit it to do only VM_FAULT_RETRY so no use after free should materialize whenever gup is involved. The problematic path for the return to userland (get_user_pages returns to kernel) is this one: if (return_to_userland) { if (signal_pending(current) && !fatal_signal_pending(current)) { /* * If we got a SIGSTOP or SIGCONT and this is * a normal userland page fault, just let * userland return so the signal will be * handled and gdb debugging works. The page * fault code immediately after we return from * this function is going to release the * mmap_sem and it's not depending on it * (unlike gup would if we were not to return * VM_FAULT_RETRY). * * If a fatal signal is pending we still take * the streamlined VM_FAULT_RETRY failure path * and there's no need to retake the mmap_sem * in such case. */ down_read(&mm->mmap_sem); ret = VM_FAULT_NOPAGE; } } We could remove the above branch all together and then handle_userfault() would always return VM_FAULT_RETRY whenever it decides to release the mmap_sem. The above makes debugging with gdb more user friendly and it potentially lowers the latency of signals as signals can unblock handle_userfault. The downside is that the return to userland cannot dereference the vma after calling handle_mm_fault. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>