On 09/22/22 09:46, David Hildenbrand wrote: > On 22.09.22 01:57, Mike Kravetz wrote: > > On 09/21/22 10:48, Mike Kravetz wrote: > > > On 09/21/22 16:34, Liu Shixin wrote: > > > > The vma_lock and hugetlb_fault_mutex are dropped before handling > > > > userfault and reacquire them again after handle_userfault(), but > > > > reacquire the vma_lock could lead to UAF[1] due to the following > > > > race, > > > > > > > > hugetlb_fault > > > > hugetlb_no_page > > > > /*unlock vma_lock */ > > > > hugetlb_handle_userfault > > > > handle_userfault > > > > /* unlock mm->mmap_lock*/ > > > > vm_mmap_pgoff > > > > do_mmap > > > > mmap_region > > > > munmap_vma_range > > > > /* clean old vma */ > > > > /* lock vma_lock again <--- UAF */ > > > > /* unlock vma_lock */ > > > > > > > > Since the vma_lock will unlock immediately after hugetlb_handle_userfault(), > > > > let's drop the unneeded lock and unlock in hugetlb_handle_userfault() to fix > > > > the issue. > > > > > > Thank you very much! > > > > > > When I saw this report, the obvious fix was to do something like what you have > > > done below. That looks fine with a few minor comments. > > > > > > One question I have not yet answered is, "Does this same issue apply to > > > follow_hugetlb_page()?". I believe it does. follow_hugetlb_page calls > > > hugetlb_fault which could result in the fault being processed by userfaultfd. > > > If we experience the race above, then the associated vma could no longer be > > > valid when returning from hugetlb_fault. follow_hugetlb_page and callers > > > have a flag (locked) to deal with dropping mmap lock. However, I am not sure > > > if it is handled correctly WRT userfaultfd. I think this needs to be answered > > > before fixing. And, if the follow_hugetlb_page code needs to be fixed it > > > should be done at the same time. > > > > > > > To at least verify this code path, I added userfaultfd handling to the gup_test > > program in kernel selftests. When doing basic gup test on a hugetlb page in > > a userfaultfd registered range, I hit this warning: > > > > [ 6939.867796] FAULT_FLAG_ALLOW_RETRY missing 1 > > [ 6939.871503] CPU: 2 PID: 5720 Comm: gup_test Not tainted 6.0.0-rc6-next-20220921+ #72 > > [ 6939.874562] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014 > > [ 6939.877707] Call Trace: > > [ 6939.878745] <TASK> > > [ 6939.879779] dump_stack_lvl+0x6c/0x9f > > [ 6939.881199] handle_userfault.cold+0x14/0x1e > > [ 6939.882830] ? find_held_lock+0x2b/0x80 > > [ 6939.884370] ? __mutex_unlock_slowpath+0x45/0x280 > > [ 6939.886145] hugetlb_handle_userfault+0x90/0xf0 > > [ 6939.887936] hugetlb_fault+0xb7e/0xda0 > > [ 6939.889409] ? vprintk_emit+0x118/0x3a0 > > [ 6939.890903] ? _printk+0x58/0x73 > > [ 6939.892279] follow_hugetlb_page.cold+0x59/0x145 > > [ 6939.894116] __get_user_pages+0x146/0x750 > > [ 6939.895580] __gup_longterm_locked+0x3e9/0x680 > > [ 6939.897023] ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0 > > [ 6939.898939] ? lockdep_hardirqs_on+0x7d/0x100 > > [ 6939.901243] gup_test_ioctl+0x320/0x6e0 > > [ 6939.902202] __x64_sys_ioctl+0x87/0xc0 > > [ 6939.903220] do_syscall_64+0x38/0x90 > > [ 6939.904233] entry_SYSCALL_64_after_hwframe+0x63/0xcd > > [ 6939.905423] RIP: 0033:0x7fbb53830f7b > > > > This is because userfaultfd is expecting FAULT_FLAG_ALLOW_RETRY which is not > > set in this path. > > Right. Without being able to drop the mmap lock, we cannot continue. And we > don't know if we can drop it without FAULT_FLAG_ALLOW_RETRY. > > FAULT_FLAG_ALLOW_RETRY is only set when we can communicate to the caller > that we dropped the mmap lock [e.g., int *locked parameter]. > > All code paths that pass NULL won't be able to handle -- especially > surprisingly also pin_user_pages_fast() -- cannot trigger usefaultfd and > will result in this warning. > > > A "sane" example is access via /proc/self/mem via ptrace: we don't want to > trigger userfaultfd, but instead simply fail the GUP get/pin. > > > Now, this is just a printed *warning* (not a WARN/BUG/taint) that tells us > that there is a GUP user that isn't prepared for userfaultfd. So it rather > points out a missing GUP adaption -- incomplete userfaultfd support. And we > seem to have plenty of that judging that pin_user_pages_fast_only(). > > Maybe the printed stack trace is a bit too much and makes this look very > scary. > > > > > Adding John, Peter and David on Cc: as they are much more fluent in all the > > fault and FOLL combinations and might have immediate suggestions. It is going > > to take me a little while to figure out: > > 1) How to make sure we get the right flags passed to handle_userfault > > This is a GUP caller problem -- or rather, how GUP has to deal with > userfaultfd. > > > 2) How to modify follow_hugetlb_page as userfaultfd can certainly drop > > mmap_lock. So we can not assume vma still exists upon return. > > Again, we have to communicate to the GUP caller that we dropped the mmap > lock. And that requires GUP caller changes. > Thank you and Peter for replying! The 'good news' is that there does not appear to be a case where userfaultfd (via hugetlb_fault) drops the lock and follow_hugetlb_page is not prepard for the consequences. So, this is not an exposure as in hugetlb_handle_userfault that is in need of an immediate fix. i.e. A fix like that originally proposed here is sufficient. We can think about whether this specific calling sequence needs to be modified. -- Mike Kravetz