Re: [PATCH] Don't mlock guardpage if the stack is growing up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 9, 2011 at 8:57 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hmm. One thing that strikes me is this problem also implies that the
> /proc/self/maps file is wrong for the GROWSUP case, isn't it?
>
> So I think we should not just apply your lock fix, but then *also*
> apply something like this:

Actually, I think we might be better off with something like this.

It makes a few more changes:

 - move the stack guard page checking in __get_user_pages() into the
rare case (ie we didn't find a page), since that's the only case we
care about (the thing about the guard page is that don't want to call
"handle_mm_fault()"). As a result, it's off any path where we can
possibly care about performance, so we might as well have a nice
helper function for both the grow-up and grow-down cases, instead of
trying to be clever and only look at the grow-down case for the first
page in the vma like you did in your patch.

   End result: simpler, more straightforward code.

 - Move the growsup/down helper functions to <linux/mm.h>, since the
/proc code really wants to use them too. That means that the
"vma_stack_continue()" function (which now got split up into two
cases, for the up/down cases) is now entirely just an internal helper
function - nobody else uses it, and the real interface are the
"stack_guard_page_xyz()"  functions. Renamed to be simpler.

 - changed that naming of those stack_guard_page functions to use
_start and _end instead of growsup/growsdown, since it actually takes
the start or the end of the page as the argument (to match the
semantics of the afore-mentioned helpers)

 - and finally, make /proc/<pid>/maps use these helpers for both the
up/down case, so now /proc/self/maps should work well for the growsup
case too.

Hmm?

The only oddish case is IA64 that actually has a stack that grows
*both* up and down. That means that I could make up a stack mapping
that has a single virtual page in it, that is both the start *and* the
end page. Now /proc/self/maps would actually show such a mapping with
"negative" size. That's interesting.

It would be easy enough to have a "if (end < start) end = start" there
for that case, but maybe it's actually interesting information.

Regardless, I'd like to hear whether this patch really does work on
PA-RISC and especially IA64. I think those are the only cases that
have a GROWSUP stack. And the IA64 case that supports both is the most
interesting, everybody else does just one or the other.

                    Linus
 fs/proc/task_mmu.c |   12 +++++++-----
 include/linux/mm.h |   24 +++++++++++++++++++++++-
 mm/memory.c        |   16 +++++++---------
 3 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2e7addfd9803..318d8654989b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -214,7 +214,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
 	int flags = vma->vm_flags;
 	unsigned long ino = 0;
 	unsigned long long pgoff = 0;
-	unsigned long start;
+	unsigned long start, end;
 	dev_t dev = 0;
 	int len;
 
@@ -227,13 +227,15 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
 
 	/* We don't show the stack guard page in /proc/maps */
 	start = vma->vm_start;
-	if (vma->vm_flags & VM_GROWSDOWN)
-		if (!vma_stack_continue(vma->vm_prev, vma->vm_start))
-			start += PAGE_SIZE;
+	if (stack_guard_page_start(vma, start))
+		start += PAGE_SIZE;
+	end = vma->vm_end;
+	if (stack_guard_page_end(vma, end))
+		end -= PAGE_SIZE;
 
 	seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
 			start,
-			vma->vm_end,
+			end,
 			flags & VM_READ ? 'r' : '-',
 			flags & VM_WRITE ? 'w' : '-',
 			flags & VM_EXEC ? 'x' : '-',
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2348db26bc3d..6507dde38b16 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1011,11 +1011,33 @@ int set_page_dirty_lock(struct page *page);
 int clear_page_dirty_for_io(struct page *page);
 
 /* Is the vma a continuation of the stack vma above it? */
-static inline int vma_stack_continue(struct vm_area_struct *vma, unsigned long addr)
+static inline int vma_growsdown(struct vm_area_struct *vma, unsigned long addr)
 {
 	return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
 }
 
+static inline int stack_guard_page_start(struct vm_area_struct *vma,
+					     unsigned long addr)
+{
+	return (vma->vm_flags & VM_GROWSDOWN) &&
+		(vma->vm_start == addr) &&
+		!vma_growsdown(vma->vm_prev, addr);
+}
+
+/* Is the vma a continuation of the stack vma below it? */
+static inline int vma_growsup(struct vm_area_struct *vma, unsigned long addr)
+{
+	return vma && (vma->vm_start == addr) && (vma->vm_flags & VM_GROWSUP);
+}
+
+static inline int stack_guard_page_end(struct vm_area_struct *vma,
+					   unsigned long addr)
+{
+	return (vma->vm_flags & VM_GROWSUP) &&
+		(vma->vm_end == addr) &&
+		!vma_growsup(vma->vm_next, addr);
+}
+
 extern unsigned long move_page_tables(struct vm_area_struct *vma,
 		unsigned long old_addr, struct vm_area_struct *new_vma,
 		unsigned long new_addr, unsigned long len);
diff --git a/mm/memory.c b/mm/memory.c
index 27f425378112..61e66f026563 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1412,9 +1412,8 @@ no_page_table:
 
 static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long addr)
 {
-	return (vma->vm_flags & VM_GROWSDOWN) &&
-		(vma->vm_start == addr) &&
-		!vma_stack_continue(vma->vm_prev, addr);
+	return stack_guard_page_start(vma, addr) ||
+	       stack_guard_page_end(vma, addr+PAGE_SIZE);
 }
 
 /**
@@ -1551,12 +1550,6 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 			continue;
 		}
 
-		/*
-		 * For mlock, just skip the stack guard page.
-		 */
-		if ((gup_flags & FOLL_MLOCK) && stack_guard_page(vma, start))
-			goto next_page;
-
 		do {
 			struct page *page;
 			unsigned int foll_flags = gup_flags;
@@ -1573,6 +1566,11 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
 				int ret;
 				unsigned int fault_flags = 0;
 
+				/* For mlock, just skip the stack guard page. */
+				if (foll_flags & FOLL_MLOCK) {
+					if (stack_guard_page(vma, start))
+						goto next_page;
+				}
 				if (foll_flags & FOLL_WRITE)
 					fault_flags |= FAULT_FLAG_WRITE;
 				if (nonblocking)

[Index of Archives]     [Linux Kernel]     [Sparc Linux]     [DCCP]     [Linux ARM]     [Yosemite News]     [Linux SCSI]     [Linux x86_64]     [Linux for Ham Radio]

  Powered by Linux