On Mon, May 9, 2011 at 8:57 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Hmm. One thing that strikes me is this problem also implies that the > /proc/self/maps file is wrong for the GROWSUP case, isn't it? > > So I think we should not just apply your lock fix, but then *also* > apply something like this: Actually, I think we might be better off with something like this. It makes a few more changes: - move the stack guard page checking in __get_user_pages() into the rare case (ie we didn't find a page), since that's the only case we care about (the thing about the guard page is that don't want to call "handle_mm_fault()"). As a result, it's off any path where we can possibly care about performance, so we might as well have a nice helper function for both the grow-up and grow-down cases, instead of trying to be clever and only look at the grow-down case for the first page in the vma like you did in your patch. End result: simpler, more straightforward code. - Move the growsup/down helper functions to <linux/mm.h>, since the /proc code really wants to use them too. That means that the "vma_stack_continue()" function (which now got split up into two cases, for the up/down cases) is now entirely just an internal helper function - nobody else uses it, and the real interface are the "stack_guard_page_xyz()" functions. Renamed to be simpler. - changed that naming of those stack_guard_page functions to use _start and _end instead of growsup/growsdown, since it actually takes the start or the end of the page as the argument (to match the semantics of the afore-mentioned helpers) - and finally, make /proc/<pid>/maps use these helpers for both the up/down case, so now /proc/self/maps should work well for the growsup case too. Hmm? The only oddish case is IA64 that actually has a stack that grows *both* up and down. That means that I could make up a stack mapping that has a single virtual page in it, that is both the start *and* the end page. Now /proc/self/maps would actually show such a mapping with "negative" size. That's interesting. It would be easy enough to have a "if (end < start) end = start" there for that case, but maybe it's actually interesting information. Regardless, I'd like to hear whether this patch really does work on PA-RISC and especially IA64. I think those are the only cases that have a GROWSUP stack. And the IA64 case that supports both is the most interesting, everybody else does just one or the other. Linus
fs/proc/task_mmu.c | 12 +++++++----- include/linux/mm.h | 24 +++++++++++++++++++++++- mm/memory.c | 16 +++++++--------- 3 files changed, 37 insertions(+), 15 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 2e7addfd9803..318d8654989b 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -214,7 +214,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) int flags = vma->vm_flags; unsigned long ino = 0; unsigned long long pgoff = 0; - unsigned long start; + unsigned long start, end; dev_t dev = 0; int len; @@ -227,13 +227,15 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) /* We don't show the stack guard page in /proc/maps */ start = vma->vm_start; - if (vma->vm_flags & VM_GROWSDOWN) - if (!vma_stack_continue(vma->vm_prev, vma->vm_start)) - start += PAGE_SIZE; + if (stack_guard_page_start(vma, start)) + start += PAGE_SIZE; + end = vma->vm_end; + if (stack_guard_page_end(vma, end)) + end -= PAGE_SIZE; seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n", start, - vma->vm_end, + end, flags & VM_READ ? 'r' : '-', flags & VM_WRITE ? 'w' : '-', flags & VM_EXEC ? 'x' : '-', diff --git a/include/linux/mm.h b/include/linux/mm.h index 2348db26bc3d..6507dde38b16 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1011,11 +1011,33 @@ int set_page_dirty_lock(struct page *page); int clear_page_dirty_for_io(struct page *page); /* Is the vma a continuation of the stack vma above it? */ -static inline int vma_stack_continue(struct vm_area_struct *vma, unsigned long addr) +static inline int vma_growsdown(struct vm_area_struct *vma, unsigned long addr) { return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN); } +static inline int stack_guard_page_start(struct vm_area_struct *vma, + unsigned long addr) +{ + return (vma->vm_flags & VM_GROWSDOWN) && + (vma->vm_start == addr) && + !vma_growsdown(vma->vm_prev, addr); +} + +/* Is the vma a continuation of the stack vma below it? */ +static inline int vma_growsup(struct vm_area_struct *vma, unsigned long addr) +{ + return vma && (vma->vm_start == addr) && (vma->vm_flags & VM_GROWSUP); +} + +static inline int stack_guard_page_end(struct vm_area_struct *vma, + unsigned long addr) +{ + return (vma->vm_flags & VM_GROWSUP) && + (vma->vm_end == addr) && + !vma_growsup(vma->vm_next, addr); +} + extern unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len); diff --git a/mm/memory.c b/mm/memory.c index 27f425378112..61e66f026563 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1412,9 +1412,8 @@ no_page_table: static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long addr) { - return (vma->vm_flags & VM_GROWSDOWN) && - (vma->vm_start == addr) && - !vma_stack_continue(vma->vm_prev, addr); + return stack_guard_page_start(vma, addr) || + stack_guard_page_end(vma, addr+PAGE_SIZE); } /** @@ -1551,12 +1550,6 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, continue; } - /* - * For mlock, just skip the stack guard page. - */ - if ((gup_flags & FOLL_MLOCK) && stack_guard_page(vma, start)) - goto next_page; - do { struct page *page; unsigned int foll_flags = gup_flags; @@ -1573,6 +1566,11 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, int ret; unsigned int fault_flags = 0; + /* For mlock, just skip the stack guard page. */ + if (foll_flags & FOLL_MLOCK) { + if (stack_guard_page(vma, start)) + goto next_page; + } if (foll_flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; if (nonblocking)