On Wed, May 4, 2011 at 5:09 PM, Michel Lespinasse <walken@xxxxxxxxxx> wrote: > > FYI, the attached code causes an infinite loop in kernels that have > the 95042f9eb7 commit: Mmm. Yes. The atomic fault will never work, and the get_user_pages() thing won't either, so things will just loop forever. > Linus, I am not sure as to what would be the preferred way to fix > this. One option could be to modify fault_in_user_writeable so that it > passes a non-NULL page pointer, and just does a put_page on it > afterwards. While this would work, this is kinda ugly and would slow > down futex operations somewhat. No, that's just ugly as hell. > A more conservative alternative could > be to enable the guard page special case under an new GUP flag, but > this loses much of the elegance of your original proposal... How about only doing that only for FOLL_MLOCK? Also, looking at mm/mlock.c, why _do_ we call get_user_pages() even if the vma isn't mlocked? That looks bogus. Since we have dropped the mm_semaphore, an unlock may have happened, and afaik we should *not* try to bring those pages back in at all. There's this whole comment about that in the caller ("__mlock_vma_pages_range() double checks the vma flags, so that it won't mlock pages if the vma was already munlocked."), but despite that it would actually call __get_user_pages() even if the VM_LOCKED bit had been cleared (it just wouldn't call it with the FOLL_MLOCK flag). So maybe something like the attached? UNTESTED! And maybe there was some really subtle reason to still call __get_user_pages() without that FOLL_MLOCK thing that I'm missing. Linus
mm/memory.c | 2 +- mm/mlock.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 607098d47e74..f7a487c908a5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1555,7 +1555,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, * If we don't actually want the page itself, * and it's the stack guard page, just skip it. */ - if (!pages && stack_guard_page(vma, start)) + if (!pages && (gup_flags & FOLL_MLOCK) && stack_guard_page(vma, start)) goto next_page; do { diff --git a/mm/mlock.c b/mm/mlock.c index 6b55e3efe0df..8ed7fd09f81c 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -162,7 +162,10 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma, VM_BUG_ON(end > vma->vm_end); VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem)); - gup_flags = FOLL_TOUCH; + if (!(vma->vm_flags & VM_LOCKED)) + return nr_pages; + + gup_flags = FOLL_TOUCH | FOLL_MLOCK; /* * We want to touch writable mappings with a write fault in order * to break COW, except for shared mappings because these don't COW @@ -178,9 +181,6 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma, if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)) gup_flags |= FOLL_FORCE; - if (vma->vm_flags & VM_LOCKED) - gup_flags |= FOLL_MLOCK; - return __get_user_pages(current, mm, addr, nr_pages, gup_flags, NULL, NULL, nonblocking); }