Ok, so here's a slightly different approach. It still makes the FOLL_MLOCK be unconditional in the mlock path, but it really just pushes down the - gup_flags = FOLL_TOUCH; + gup_flags = FOLL_TOUCH | FOLL_MLOCK; ... - if (vma->vm_flags & VM_LOCKED) - gup_flags |= FOLL_MLOCK; from __mlock_vma_pages_range(), and moves the conditional into 'follow_page()' (which is the only _user_ of that flag) instead: - if (flags & FOLL_MLOCK) { + if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { so semantically this changes nothing at all. But now, because __get_user_pages() can look at FOLL_MLOCK to see that it's a mlock access, we can do that whole "skip stack guard page" based on that flag: - if (!pages && stack_guard_page(vma, start)) + if ((gup_flags & FOLL_MLOCK) && stack_guard_page(vma, start)) which means that other uses will try to page in the stack guard page. I seriously considered making that "skip stack guard page" and the "mlock lookup" be two separate bits, because conceptually they are really pretty independent, but right now the only _users_ seem to be tied together, so I kept it as one single bit (FOLL_MLOCK). But as far as I can tell, the attached patch is 100% equivalent to what we do now, except for that "skip stack guard pages only for mlock" change. Comments? I like this patch because it seems to make the logic more straightforward. But somebody else should double-check my logic. Linus
mm/memory.c | 7 +++---- mm/mlock.c | 5 +---- 2 files changed, 4 insertions(+), 8 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 607098d47e74..27f425378112 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1359,7 +1359,7 @@ split_fallthrough: */ mark_page_accessed(page); } - if (flags & FOLL_MLOCK) { + if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { /* * The preliminary mapping check is mainly to avoid the * pointless overhead of lock_page on the ZERO_PAGE @@ -1552,10 +1552,9 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, } /* - * If we don't actually want the page itself, - * and it's the stack guard page, just skip it. + * For mlock, just skip the stack guard page. */ - if (!pages && stack_guard_page(vma, start)) + if ((gup_flags & FOLL_MLOCK) && stack_guard_page(vma, start)) goto next_page; do { diff --git a/mm/mlock.c b/mm/mlock.c index 6b55e3efe0df..516b2c2ddd5a 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -162,7 +162,7 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma, VM_BUG_ON(end > vma->vm_end); VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem)); - gup_flags = FOLL_TOUCH; + gup_flags = FOLL_TOUCH | FOLL_MLOCK; /* * We want to touch writable mappings with a write fault in order * to break COW, except for shared mappings because these don't COW @@ -178,9 +178,6 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma, if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)) gup_flags |= FOLL_FORCE; - if (vma->vm_flags & VM_LOCKED) - gup_flags |= FOLL_MLOCK; - return __get_user_pages(current, mm, addr, nr_pages, gup_flags, NULL, NULL, nonblocking); }