Re: [PATCH] [RFC] mm: mmap: Allow mmap(MAP_STACK) to map growable stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Helge Deller <deller@xxxxxx> [240911 22:10]:
> On 9/12/24 03:32, Liam R. Howlett wrote:
> > * Helge Deller <deller@xxxxxx> [240911 20:51]:
> > > On 9/12/24 01:05, Liam R. Howlett wrote:
> > > > * Yang Shi <shy828301@xxxxxxxxx> [240911 18:16]:
> > > > > On Wed, Sep 11, 2024 at 12:49 PM Liam R. Howlett
> > > > > <Liam.Howlett@xxxxxxxxxx> wrote:
> > > > > > 
> > > > > > * Helge Deller <deller@xxxxxxxxxx> [240911 15:20]:
> > > > > > > This is a RFC to change the behaviour of mmap(MAP_STACK) to be
> > > > > > > sufficient to map memory for usage as stack on all architectures.
> > > > > > > Currently MAP_STACK is a no-op on Linux, and instead MAP_GROWSDOWN
> > > > > > > has to be used.
> > > > > > > To clarify, here is the relevant info from the mmap() man page:
> > > > > > > 
> > > > > > > MAP_GROWSDOWN
> > > > > > >      This flag is used for stacks. It indicates to the kernel virtual
> > > > > > >      memory system that the mapping should extend downward in memory.  The
> > > > > > >      return address is one page lower than the memory area that is
> > > > > > >      actually created in the process's virtual address space.  Touching an
> > > > > > >      address in the "guard" page below the mapping will cause the mapping
> > > > > > >      to grow by a page. This growth can be repeated until the mapping
> > > > > > >      grows to within a page of the high end of the next lower mapping,
> > > > > > >      at which point touching the "guard" page will result in a SIGSEGV
> > > > > > >      signal.
> > > > > > > 
> > > > > > > MAP_STACK (since Linux 2.6.27)
> > > > > > >      Allocate the mapping at an address suitable for a process or thread
> > > > > > >      stack.
> > > > > > > 
> > > > > > >      This flag is currently a no-op on Linux. However, by employing this
> > > > > > >      flag, applications can ensure that they transparently obtain support
> > > > > > >      if the flag is implemented in the future. Thus, it is used in the
> > > > > > >      glibc threading implementation to allow for the fact that
> > > > > > >      some architectures may (later) require special treatment for
> > > > > > >      stack allocations. A further reason to employ this flag is
> > > > > > >      portability: MAP_STACK exists (and has an effect) on some
> > > > > > >      other systems (e.g., some of the BSDs).
> > > > > > > 
> > > > > > > The reason to suggest this change is, that on the parisc architecture the
> > > > > > > stack grows upwards. As such, using solely the MAP_GROWSDOWN flag will not
> > > > > > > work. Note that there exists no MAP_GROWSUP flag.
> > > > > > > By changing the behaviour of MAP_STACK to mark the memory area with the
> > > > > > > VM_STACK bit (which is VM_GROWSUP or VM_GROWSDOWN depending on the
> > > > > > > architecture) the MAP_STACK flag does exactly what people would expect on
> > > > > > > all platforms.
> > > > > > > 
> > > > > > > This change should have no negative side-effect, as all code which
> > > > > > > used mmap(MAP_GROWSDOWN | MAP_STACK) still work as before.
> > > > > > > 
> > > > > > > Signed-off-by: Helge Deller <deller@xxxxxx>
> > > > > > > 
> > > > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > > > > > index bcb201ab7a41..66bc72a0cb19 100644
> > > > > > > --- a/include/linux/mman.h
> > > > > > > +++ b/include/linux/mman.h
> > > > > > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > > > > > >         return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> > > > > > >                _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> > > > > > >                _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> > > > > > > +            _calc_vm_trans(flags, MAP_STACK,      VM_STACK     ) |
> > > > > > 
> > > > > > Right now MAP_STACK can be used to set VM_NOHUGEPAGE, but this will
> > > > > > change the user interface to create a vma that will grow.  I'm not
> > > > > > entirely sure this is okay?
> > > > > 
> > > > > AFAICT, I don't see this is a problem. Currently huge page also skips
> > > > > the VMAs with VM_GROWS* flags set. See vma_is_temporary_stack().
> > > > > __thp_vma_allowable_orders() returns 0 if the vma is a temporary
> > > > > stack.
> > > > 
> > > > If someone is using MAP_STACK to avoid having a huge page, they will
> > > > also get a mapping that grows - which is different than what happens
> > > > today.
> > > > 
> > > > I'm not saying that's right, but someone could be abusing the existing
> > > > flag and this will change the behaviour.
> > > 
> > > Wouldn't a plain mmap() followed by madvise(MADV_NOHUGEPAGE) do exactly that?
> > > Why abusing MAP_STACK for that?
> > 
> > I can think of two answers:
> > 1. An error that has worked without issues so far
> > 2. One less system call
> > 
> > I'm not saying this really is a blocker, but the change is not without
> > risk as it does change behaviour the user could see.
> > 
> > Interestingly enough, the man page is incorrect as it is written because
> > the flag is not strictly a no-op; it ensures no huge pages.  So the
> > feature of applying VM_NOHUGEPAGE with the use of MAP_STACK is not
> > documented today.
> 
> Yes.
> 
> > What happens to call that use the mmap(MAP_GROWSDOWN | MAP_STACK) on
> > parisc today?
> 
> Today, without my patch, on parisc the area is then flagged VM_GROWSDOWN only.
> In that case, stack expansion will not work, as it checks for VM_GROWSUP.
> 
> > How does this change with your VM_STACK change?  Wouldn't this result
> > in failed mappings?
> > VM_GROWSDOWN | VM_GROWSUP would fail in do_mmap(), and these would be> set if you map MAP_STACK to VM_STACK which is defined as VM_GROWSUP?
> 
> Right, with my change, the area will be tagged VM_GROWSUP and VM_GROWSDOWN.
> Due to VM_GROWSUP stack expansion works.
> The mapping doesn't fail in do_mmap(), because stacks are not file-mapped
> or shared or droppable. They should be mapped with MAP_PRIVATE, right?

Oh my, yes.  So now you will have a stack that can expand in either
direction, but it's okay because one direction isn't checked.  I sure
hope the rest of the code is correctly #ifdef'ed for this.

> 
> 
> Another option is to introduce an alias, e.g.:
> #define MAP_STACK_EXPANDABLE	MAP_GROWSDOWN
> and then

I don't like either of these options.

I guess you could also detect the MAP_STACK and MAP_GROWSDOWN and change
the flags for parisc, which I also don't like, but since parisc is the
only arch using this it's hard to justify a change that may cause issues
in other archs.

A quick grep shows we set VM_STACK_DEFAULT_FLAGS in x86 and powerpc,
which could affect what happens here with your change.  I am concerned
about the bleeding of other flags through this change.


> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index bcb201ab7a41..6a7ec3e0078a 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -153,7 +153,7 @@ calc_vm_prot_bits(unsigned long prot, unsigned long pkey)
>  static inline unsigned long
>  calc_vm_flag_bits(unsigned long flags)
>  {
> -       return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> +       return _calc_vm_trans(flags, MAP_STACK_EXPANDABLE,  VM_STACK ) |
>                _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>                _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
>                _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
> 
> This simply uses the existing MAP_GROWSDOWN flag to mark a stack,
> is independend on the stack growth direction and is fully backwards
> compatible to existing behaviour.
> 
> I prefer my initial proposal to change MAP_STACK though, as it's
> simpler and clearer.
> 
> Helge
> 





[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux