On Tue, Oct 08, 2024 at 04:20:13PM +0200, Jann Horn wrote: > On Tue, Oct 8, 2024 at 1:40 PM Lorenzo Stoakes > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > This is touching mm/mmap.c, please ensure to cc- the reviewers (me and > > Liam, I have cc'd Liam here) as listed in MAINTAINERS when submitting > > patches for this file. > > Ah, my bad, sorry about that. > > > Also this seems like a really speculative 'please discuss' change so should > > be an RFC imo. > > > > On Tue, Oct 08, 2024 at 12:55:39AM +0200, Jann Horn wrote: > > > As explained in the comment block this change adds, we can't tell what > > > userspace's intent is when the stack grows towards an inaccessible VMA. > > > > > > I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc > > > that mixes malloc(), pthread creation, and recursion in just the right way > > > such that the main stack overflows into malloc() arena memory. > > > (Let me know if you want me to share that.) > > > > You are claiming a fixes from 2017 and cc'ing stable on a non-RFC so > > yeah... we're going to need more than your word for it :) we will want to > > be really sure this is a thing before we backport to basically every > > stable kernel. > > > > Surely this isn't that hard to demonstrate though? You mmap something > > PROT_NONE just stack gap below the stack, then intentionally extend stack > > to it, before mprotect()'ing the PROT_NONE region? > > I've attached my test case that demonstrates this using basically only > malloc, free, pthread_create() and recursion, plus a bunch of ugly > read-only gunk and synchronization. It assumes that it runs under > glibc, as a 32-bit x86 binary. Usage: Thanks! > > $ clang -O2 -fstack-check -m32 -o grow-32 grow-32.c -pthread -ggdb && > for i in {0..10}; do ./grow-32; done > corrupted thread_obj2 at depth 190528 > corrupted thread_obj2 at depth 159517 > corrupted thread_obj2 at depth 209777 > corrupted thread_obj2 at depth 200119 > corrupted thread_obj2 at depth 208093 > corrupted thread_obj2 at depth 167705 > corrupted thread_obj2 at depth 234523 > corrupted thread_obj2 at depth 174528 > corrupted thread_obj2 at depth 223823 > corrupted thread_obj2 at depth 199816 > grow-32: malloc failed: Cannot allocate memory > > This demonstrates that it is possible for a userspace program that is > just using basic libc functionality, and whose only bug is unbounded > recursion, to corrupt memory despite being built with -fstack-check. > > As you said, to just demonstrate the core issue in a more contrived > way, you can also use a simpler example: > > $ cat basic-grow-repro.c > #include <err.h> > #include <stdlib.h> > #include <sys/mman.h> > > #define STACK_POINTER() ({ void *__stack; asm volatile("mov %%rsp, > %0":"=r"(__stack)); __stack; }) > > int main(void) { > char *ptr = (char*)( (unsigned long)(STACK_POINTER() - > (1024*1024*4)/*4MiB*/) & ~0xfffUL ); > if (mmap(ptr, 0x1000, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr) > err(1, "mmap"); > *(volatile char *)(ptr + 0x1000); /* expand stack */ > if (mprotect(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC)) > err(1, "mprotect"); > system("cat /proc/$PPID/maps"); > } > $ gcc -o basic-grow-repro basic-grow-repro.c > $ ./basic-grow-repro > 5600a0fef000-5600a0ff0000 r--p 00000000 fd:01 28313737 > [...]/basic-grow-repro > 5600a0ff0000-5600a0ff1000 r-xp 00001000 fd:01 28313737 > [...]/basic-grow-repro > 5600a0ff1000-5600a0ff2000 r--p 00002000 fd:01 28313737 > [...]/basic-grow-repro > 5600a0ff2000-5600a0ff3000 r--p 00002000 fd:01 28313737 > [...]/basic-grow-repro > 5600a0ff3000-5600a0ff4000 rw-p 00003000 fd:01 28313737 > [...]/basic-grow-repro > 7f9a88553000-7f9a88556000 rw-p 00000000 00:00 0 > 7f9a88556000-7f9a8857c000 r--p 00000000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a8857c000-7f9a886d2000 r-xp 00026000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a886d2000-7f9a88727000 r--p 0017c000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a88727000-7f9a8872b000 r--p 001d0000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a8872b000-7f9a8872d000 rw-p 001d4000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f9a8872d000-7f9a8873a000 rw-p 00000000 00:00 0 > 7f9a88754000-7f9a88756000 rw-p 00000000 00:00 0 > 7f9a88756000-7f9a8875a000 r--p 00000000 00:00 0 [vvar] > 7f9a8875a000-7f9a8875c000 r-xp 00000000 00:00 0 [vdso] > 7f9a8875c000-7f9a8875d000 r--p 00000000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f9a8875d000-7f9a88782000 r-xp 00001000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f9a88782000-7f9a8878c000 r--p 00026000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f9a8878c000-7f9a8878e000 r--p 00030000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f9a8878e000-7f9a88790000 rw-p 00032000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7fff84664000-7fff84665000 rwxp 00000000 00:00 0 > 7fff84665000-7fff84a67000 rw-p 00000000 00:00 0 [stack] > $ > > > Though, while writing the above reproducer, I noticed another dodgy > scenario regarding the stack gap: MAP_FIXED_NOREPLACE apparently > ignores the stack guard region, because it only checks for VMA > intersection, see this example: Oh my. > > $ cat basic-grow-repro-ohno.c > #include <err.h> > #include <stdio.h> > #include <stdlib.h> > #include <sys/mman.h> > > #define STACK_POINTER() ({ void *__stack; asm volatile("mov %%rsp, > %0":"=r"(__stack)); __stack; }) > > int main(void) { > setbuf(stdout, NULL); > char *ptr = (char*)( (unsigned long)(STACK_POINTER() - > (1024*1024*4)/*4MiB*/) & ~0xfffUL ); > *(volatile char *)(ptr + 0x1000); /* expand stack to just above ptr */ > > printf("trying to map at %p\n", ptr); > system("cat /proc/$PPID/maps;echo"); > if (mmap(ptr, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC, > MAP_FIXED_NOREPLACE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0) != ptr) > err(1, "mmap"); > system("cat /proc/$PPID/maps"); > } > $ gcc -o basic-grow-repro-ohno basic-grow-repro-ohno.c > $ ./basic-grow-repro-ohno > trying to map at 0x7ffc344ca000 > 560ee371d000-560ee371e000 r--p 00000000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee371e000-560ee371f000 r-xp 00001000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee371f000-560ee3720000 r--p 00002000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee3720000-560ee3721000 r--p 00002000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee3721000-560ee3722000 rw-p 00003000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 7f0d636ed000-7f0d636f0000 rw-p 00000000 00:00 0 > 7f0d636f0000-7f0d63716000 r--p 00000000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d63716000-7f0d6386c000 r-xp 00026000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d6386c000-7f0d638c1000 r--p 0017c000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c1000-7f0d638c5000 r--p 001d0000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c5000-7f0d638c7000 rw-p 001d4000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c7000-7f0d638d4000 rw-p 00000000 00:00 0 > 7f0d638ee000-7f0d638f0000 rw-p 00000000 00:00 0 > 7f0d638f0000-7f0d638f4000 r--p 00000000 00:00 0 [vvar] > 7f0d638f4000-7f0d638f6000 r-xp 00000000 00:00 0 [vdso] > 7f0d638f6000-7f0d638f7000 r--p 00000000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d638f7000-7f0d6391c000 r-xp 00001000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d6391c000-7f0d63926000 r--p 00026000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d63926000-7f0d63928000 r--p 00030000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d63928000-7f0d6392a000 rw-p 00032000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7ffc344cb000-7ffc348cd000 rw-p 00000000 00:00 0 [stack] > > 560ee371d000-560ee371e000 r--p 00000000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee371e000-560ee371f000 r-xp 00001000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee371f000-560ee3720000 r--p 00002000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee3720000-560ee3721000 r--p 00002000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 560ee3721000-560ee3722000 rw-p 00003000 fd:01 28313842 > [...]/basic-grow-repro-ohno > 7f0d636ed000-7f0d636f0000 rw-p 00000000 00:00 0 > 7f0d636f0000-7f0d63716000 r--p 00000000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d63716000-7f0d6386c000 r-xp 00026000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d6386c000-7f0d638c1000 r--p 0017c000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c1000-7f0d638c5000 r--p 001d0000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c5000-7f0d638c7000 rw-p 001d4000 fd:01 3417714 > /usr/lib/x86_64-linux-gnu/libc.so.6 > 7f0d638c7000-7f0d638d4000 rw-p 00000000 00:00 0 > 7f0d638ee000-7f0d638f0000 rw-p 00000000 00:00 0 > 7f0d638f0000-7f0d638f4000 r--p 00000000 00:00 0 [vvar] > 7f0d638f4000-7f0d638f6000 r-xp 00000000 00:00 0 [vdso] > 7f0d638f6000-7f0d638f7000 r--p 00000000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d638f7000-7f0d6391c000 r-xp 00001000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d6391c000-7f0d63926000 r--p 00026000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d63926000-7f0d63928000 r--p 00030000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7f0d63928000-7f0d6392a000 rw-p 00032000 fd:01 3409055 > /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 > 7ffc344ca000-7ffc344cb000 rwxp 00000000 00:00 0 > 7ffc344cb000-7ffc348cd000 rw-p 00000000 00:00 0 [stack] > $ > > That could also be bad: MAP_FIXED_NOREPLACE exists, from what I > understand, partly so that malloc implementations can use it to grow > heap memory chunks (though glibc doesn't use it, I'm not sure who > actually uses it that way). We wouldn't want a malloc implementation > to grow a heap memory chunk until it is directly adjacent to a stack. It seems... weird to use it that way as you couldn't be sure you weren't overwriting another VMA. > > > > I don't know of any specific scenario where this is actually exploitable, > > > but it seems like it could be a security problem for sufficiently unlucky > > > userspace. > > > > > > I believe we should ensure that, as long as code is compiled with something > > > like -fstack-check, a stack overflow in it can never cause the main stack > > > to overflow into adjacent heap memory. > > > > > > My fix effectively reverts the behavior for !vma_is_accessible() VMAs to > > > the behavior before commit 1be7107fbe18 ("mm: larger stack guard gap, > > > between vmas"), so I think it should be a fairly safe change even in > > > case A. > > > > > > Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack") > > > Cc: stable@xxxxxxxxxxxxxxx > > > Signed-off-by: Jann Horn <jannh@xxxxxxxxxx> > > > --- > > > I have tested that Libreoffice still launches after this change, though > > > I don't know if that means anything. > > > > > > Note that I haven't tested the growsup code. > > > --- > > > mm/mmap.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++------- > > > 1 file changed, 46 insertions(+), 7 deletions(-) > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index dd4b35a25aeb..971bfd6c1cea 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -1064,10 +1064,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address) > > > gap_addr = TASK_SIZE; > > > > > > next = find_vma_intersection(mm, vma->vm_end, gap_addr); > > > - if (next && vma_is_accessible(next)) { > > > - if (!(next->vm_flags & VM_GROWSUP)) > > > + if (next && !(next->vm_flags & VM_GROWSUP)) { > > > + /* see comments in expand_downwards() */ > > > + if (vma_is_accessible(prev)) > > > + return -ENOMEM; > > > + if (address == next->vm_start) > > > return -ENOMEM; > > > - /* Check that both stack segments have the same anon_vma? */ > > > > I hate that we even maintain this for one single arch I believe at this point? > > Looks that way, just parisc? > > It would be so nice if we could somehow just get rid of this concept > of growing stacks in general... > > > > } > > > > > > if (next) > > > @@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) > > > /* Enforce stack_guard_gap */ > > > prev = vma_prev(&vmi); > > > /* Check that both stack segments have the same anon_vma? */ > > > - if (prev) { > > > - if (!(prev->vm_flags & VM_GROWSDOWN) && > > > - vma_is_accessible(prev) && > > > - (address - prev->vm_end < stack_guard_gap)) > > > + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && > > > + (address - prev->vm_end < stack_guard_gap)) { > > > + /* > > > + * If the previous VMA is accessible, this is the normal case > > > + * where the main stack is growing down towards some unrelated > > > + * VMA. Enforce the full stack guard gap. > > > + */ > > > + if (vma_is_accessible(prev)) > > > + return -ENOMEM; > > > + > > > + /* > > > + * If the previous VMA is not accessible, we have a problem: > > > + * We can't tell what userspace's intent is. > > > + * > > > + * Case A: > > > + * Maybe userspace wants to use the previous VMA as a > > > + * "guard region" at the bottom of the main stack, in which case > > > + * userspace wants us to grow the stack until it is adjacent to > > > + * the guard region. Apparently some Java runtime environments > > > + * and Rust do that? > > > + * That is kind of ugly, and in that case userspace really ought > > > + * to ensure that the stack is fully expanded immediately, but > > > + * we have to handle this case. > > > > Yeah we can't break userspace on this, no doubt somebody is relying on this > > _somewhere_. > > It would have to be a new user who appeared after commit 1be7107fbe18. > And they'd have to install a "guard vma" somewhere below the main > stack, and they'd have to care so much about the size of the stack > that a single page makes a difference. You did say 'Apparently some Java runtime environments and Rust do that' though right? Or am I misunderstanding? > > > That said, I wish we disallowed this altogether regardless of accessibility. > > > > > + * > > > + * Case B: > > > + * But maybe the previous VMA is entirely unrelated to the stack > > > + * and is only *temporarily* PROT_NONE. For example, glibc > > > + * malloc arenas create a big PROT_NONE region and then > > > + * progressively mark parts of it as writable. > > > + * In that case, we must not let the stack become adjacent to > > > + * the previous VMA. Otherwise, after the region later becomes > > > + * writable, a stack overflow will cause the stack to grow into > > > + * the previous VMA, and we won't have any stack gap to protect > > > + * against this. > > > > Should be careful with terminology here, an mprotect() will not allow a > > merge so by 'grow into' you mean that a stack VMA could become immediately > > adjacent to a non-stack VMA prior to it which was later made writable. > > > > Perhaps I am being pedantic... > > Ah, sorry, I worded that very confusingly. By "a stack overflow will > cause the stack to grow into the previous VMA", I meant that the stack > pointer moves into the previous VMA and the program uses part of the > previous VMA as stack memory, clobbering whatever was stored there > before. That part was not meant to talk about a change of VMA bounds. Sure, yes this is what I assumed. Might be worth clarifying that just to be _totally_ clear. > > > > + * > > > + * As an ugly tradeoff, enforce a single-page gap. > > > + * A single page will hopefully be small enough to not be > > > + * noticed in case A, while providing the same level of > > > + * protection in case B that normal userspace threads get. > > > + */ > > > + if (address == prev->vm_end) > > > return -ENOMEM; > > > > Ugh, yuck. Not a fan of this at all. Feels like a dreadful hack. > > Oh, I agree, I just didn't see a better way to do it. > > > You do raise an important point here, but it strikes me that we should be > > doing this check in the mprotect()/mmap() MAP_FIXED scenarios where it > > shouldn't be too costly to check against the next VMA (which we will be > > obtaining anyway for merge checks)? > > > > That way we don't need a hack like this, and can just disallow the > > operation. That'd probably be as liable to break the program as an -ENOMEM > > on a stack expansion would... > > Hmm... yeah, I guess that would work. If someone hits this case, it > would probably be less obvious to the programmer what went wrong based > on the error code, but on the other hand, it would give userspace a > slightly better chance to recover from the issue... > > I guess I can see if I can code that up. Thanks!