On Tue, Mar 8, 2022 at 9:40 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Hmm. The futex code actually uses "fixup_user_fault()" for this case. > Maybe fault_in_safe_writeable() should do that? .. paging all the bits back in, I'm reminded that one of the worries was "what about large areas". But I really think that the solution should be that we limit the size of fault_in_safe_writeable() to just a couple of pages. Even if you were to fault in gigabytes, page-out can undo it anyway, so there is no semantic reason why that function should ever do more than a few pages to make sure. There's already even a comment about how there's no guarantee that the pages will stay. Side note: the current GUP-based fault_in_safe_writeable() is buggy in another way anyway: it doesn't work right for stack extending accesses. So I think the fix for this all might be something like the attached (TOTALLY UNTESTED)! Comments? Andreas, mind (carefully - maybe it is totally broken and does unspeakable acts to your pets) testing this? Linus
mm/gup.c | 40 ++++++++++++---------------------------- 1 file changed, 12 insertions(+), 28 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index a9d4d724aef7..9e085e7b9c28 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1745,44 +1745,28 @@ EXPORT_SYMBOL(fault_in_writeable); size_t fault_in_safe_writeable(const char __user *uaddr, size_t size) { unsigned long start = (unsigned long)untagged_addr(uaddr); - unsigned long end, nstart, nend; + unsigned long end, nstart; struct mm_struct *mm = current->mm; - struct vm_area_struct *vma = NULL; - int locked = 0; + const unsigned int fault_flags = FAULT_FLAG_WRITE | FAULT_FLAG_KILLABLE; + const size_t max_size = 4 * PAGE_SIZE; nstart = start & PAGE_MASK; - end = PAGE_ALIGN(start + size); + end = PAGE_ALIGN(start + min(size, max_size)); if (end < nstart) end = 0; - for (; nstart != end; nstart = nend) { - unsigned long nr_pages; - long ret; - if (!locked) { - locked = 1; - mmap_read_lock(mm); - vma = find_vma(mm, nstart); - } else if (nstart >= vma->vm_end) - vma = vma->vm_next; - if (!vma || vma->vm_start >= end) - break; - nend = end ? min(end, vma->vm_end) : vma->vm_end; - if (vma->vm_flags & (VM_IO | VM_PFNMAP)) - continue; - if (nstart < vma->vm_start) - nstart = vma->vm_start; - nr_pages = (nend - nstart) / PAGE_SIZE; - ret = __get_user_pages_locked(mm, nstart, nr_pages, - NULL, NULL, &locked, - FOLL_TOUCH | FOLL_WRITE); - if (ret <= 0) + mmap_read_lock(mm); + for (; nstart != end; nstart += PAGE_SIZE) { + if (fixup_user_fault(mm, nstart, fault_flags, NULL)) break; - nend = nstart + ret * PAGE_SIZE; } - if (locked) - mmap_read_unlock(mm); + mmap_read_unlock(mm); + + /* If we got all of our (truncated) fault-in, we claim we got it all */ if (nstart == end) return 0; + + /* .. otherwise we'll use the original untruncated size */ return size - min_t(size_t, nstart - start, size); } EXPORT_SYMBOL(fault_in_safe_writeable);