On Tue, Oct 31, 2017 at 6:57 AM, Vlastimil Babka <vbabka@xxxxxxx> wrote: > > However, __do_page_fault() only expects that mmap_sem to be released > when handle_mm_fault() returns with VM_FAULT_RETRY. It doesn't expect it > to be released and then acquired again, because then vma can be indeed > gone. Yes. Accessing "vma" after calling "handle_mm_fault()" is a bug. An unfortunate issue with userfaultfd. The suggested fix to simply look up pkey beforehand seems sane and simple. But sadly, from a quick check, it looks like arch/um/ has the same bug, but even worse. It will do (a) handle_mm_fault() in a loop without re-calculating vma. Don't ask me why. (b) flush_tlb_page(vma, address); afterwards but much more importantly, I think __get_user_pages() is broken in two ways: - faultin_page() does: ret = handle_mm_fault(vma, address, fault_flags); ... if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE)) (easily fixed the same way) - more annoyingly and harder to fix: the retry case in __get_user_pages(), and the VMA saving there. Ho humm. Andrea, looking at that get_user_pages() case, I really think it's userfaultfd that is broken. Could we perhaps limit userfaultfd to _only_ do the VM_FAULT_RETRY, and simply fail for non-retry faults? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>