Re: [PATCH] mm: fix possible cause of a page_mapped BUG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 4, 2011 at 5:09 PM, Michel Lespinasse <walken@xxxxxxxxxx> wrote:
>
> FYI, the attached code causes an infinite loop in kernels that have
> the 95042f9eb7 commit:

Mmm.

Yes. The atomic fault will never work, and the get_user_pages() thing
won't either, so things will just loop forever.

> Linus, I am not sure as to what would be the preferred way to fix
> this. One option could be to modify fault_in_user_writeable so that it
> passes a non-NULL page pointer, and just does a put_page on it
> afterwards. While this would work, this is kinda ugly and would slow
> down futex operations somewhat.

No, that's just ugly as hell.

> A more conservative alternative could
> be to enable the guard page special case under an new GUP flag, but
> this loses much of the elegance of your original proposal...

How about only doing that only for FOLL_MLOCK?

Also, looking at mm/mlock.c, why _do_ we call get_user_pages() even if
the vma isn't mlocked? That looks bogus. Since we have dropped the
mm_semaphore, an unlock may have happened, and afaik we should *not*
try to bring those pages back in at all. There's this whole comment
about that in the caller ("__mlock_vma_pages_range() double checks the
vma flags, so that it won't mlock pages if the vma was already
munlocked."), but despite that it would actually call
__get_user_pages() even if the VM_LOCKED bit had been cleared (it just
wouldn't call it with the FOLL_MLOCK flag).

So maybe something like the attached?

UNTESTED! And maybe there was some really subtle reason to still call
__get_user_pages() without that FOLL_MLOCK thing that I'm missing.

                           Linus
 mm/memory.c |    2 +-
 mm/mlock.c  |    8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 607098d47e74..f7a487c908a5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1555,7 +1555,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 		 * If we don't actually want the page itself,
 		 * and it's the stack guard page, just skip it.
 		 */
-		if (!pages && stack_guard_page(vma, start))
+		if (!pages && (gup_flags & FOLL_MLOCK) && stack_guard_page(vma, start))
 			goto next_page;
 
 		do {
diff --git a/mm/mlock.c b/mm/mlock.c
index 6b55e3efe0df..8ed7fd09f81c 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -162,7 +162,10 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma,
 	VM_BUG_ON(end   > vma->vm_end);
 	VM_BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
 
-	gup_flags = FOLL_TOUCH;
+	if (!(vma->vm_flags & VM_LOCKED))
+		return nr_pages;
+
+	gup_flags = FOLL_TOUCH | FOLL_MLOCK;
 	/*
 	 * We want to touch writable mappings with a write fault in order
 	 * to break COW, except for shared mappings because these don't COW
@@ -178,9 +181,6 @@ static long __mlock_vma_pages_range(struct vm_area_struct *vma,
 	if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC))
 		gup_flags |= FOLL_FORCE;
 
-	if (vma->vm_flags & VM_LOCKED)
-		gup_flags |= FOLL_MLOCK;
-
 	return __get_user_pages(current, mm, addr, nr_pages, gup_flags,
 				NULL, NULL, nonblocking);
 }

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]