On Wed, Sep 11, 2024 at 6:42 PM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: > > * Yang Shi <shy828301@xxxxxxxxx> [240911 21:08]: > > On Wed, Sep 11, 2024 at 5:50 PM Helge Deller <deller@xxxxxx> wrote: > > > > > > On 9/12/24 01:05, Liam R. Howlett wrote: > > > > * Yang Shi <shy828301@xxxxxxxxx> [240911 18:16]: > > > >> On Wed, Sep 11, 2024 at 12:49 PM Liam R. Howlett > > > >> <Liam.Howlett@xxxxxxxxxx> wrote: > > > >>> > > > >>> * Helge Deller <deller@xxxxxxxxxx> [240911 15:20]: > > > >>>> This is a RFC to change the behaviour of mmap(MAP_STACK) to be > > > >>>> sufficient to map memory for usage as stack on all architectures. > > > >>>> Currently MAP_STACK is a no-op on Linux, and instead MAP_GROWSDOWN > > > >>>> has to be used. > > > >>>> To clarify, here is the relevant info from the mmap() man page: > > > >>>> > > > >>>> MAP_GROWSDOWN > > > >>>> This flag is used for stacks. It indicates to the kernel virtual > > > >>>> memory system that the mapping should extend downward in memory. The > > > >>>> return address is one page lower than the memory area that is > > > >>>> actually created in the process's virtual address space. Touching an > > > >>>> address in the "guard" page below the mapping will cause the mapping > > > >>>> to grow by a page. This growth can be repeated until the mapping > > > >>>> grows to within a page of the high end of the next lower mapping, > > > >>>> at which point touching the "guard" page will result in a SIGSEGV > > > >>>> signal. > > > >>>> > > > >>>> MAP_STACK (since Linux 2.6.27) > > > >>>> Allocate the mapping at an address suitable for a process or thread > > > >>>> stack. > > > >>>> > > > >>>> This flag is currently a no-op on Linux. However, by employing this > > > >>>> flag, applications can ensure that they transparently obtain support > > > >>>> if the flag is implemented in the future. Thus, it is used in the > > > >>>> glibc threading implementation to allow for the fact that > > > >>>> some architectures may (later) require special treatment for > > > >>>> stack allocations. A further reason to employ this flag is > > > >>>> portability: MAP_STACK exists (and has an effect) on some > > > >>>> other systems (e.g., some of the BSDs). > > > >>>> > > > >>>> The reason to suggest this change is, that on the parisc architecture the > > > >>>> stack grows upwards. As such, using solely the MAP_GROWSDOWN flag will not > > > >>>> work. Note that there exists no MAP_GROWSUP flag. > > > >>>> By changing the behaviour of MAP_STACK to mark the memory area with the > > > >>>> VM_STACK bit (which is VM_GROWSUP or VM_GROWSDOWN depending on the > > > >>>> architecture) the MAP_STACK flag does exactly what people would expect on > > > >>>> all platforms. > > > >>>> > > > >>>> This change should have no negative side-effect, as all code which > > > >>>> used mmap(MAP_GROWSDOWN | MAP_STACK) still work as before. > > > >>>> > > > >>>> Signed-off-by: Helge Deller <deller@xxxxxx> > > > >>>> > > > >>>> diff --git a/include/linux/mman.h b/include/linux/mman.h > > > >>>> index bcb201ab7a41..66bc72a0cb19 100644 > > > >>>> --- a/include/linux/mman.h > > > >>>> +++ b/include/linux/mman.h > > > >>>> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) > > > >>>> return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | > > > >>>> _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | > > > >>>> _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | > > > >>>> + _calc_vm_trans(flags, MAP_STACK, VM_STACK ) | > > > >>> > > > >>> Right now MAP_STACK can be used to set VM_NOHUGEPAGE, but this will > > > >>> change the user interface to create a vma that will grow. I'm not > > > >>> entirely sure this is okay? > > > >> > > > >> AFAICT, I don't see this is a problem. Currently huge page also skips > > > >> the VMAs with VM_GROWS* flags set. See vma_is_temporary_stack(). > > > >> __thp_vma_allowable_orders() returns 0 if the vma is a temporary > > > >> stack. > > > > > > > > If someone is using MAP_STACK to avoid having a huge page, they will > > > > also get a mapping that grows - which is different than what happens > > > > today. > > > > > > > > I'm not saying that's right, but someone could be abusing the existing > > > > flag and this will change the behaviour. > > > > > > Wouldn't a plain mmap() followed by madvise(MADV_NOHUGEPAGE) do exactly that? > > > Why abusing MAP_STACK for that? > > > > Different sources and reports showed having huge pages for stack > > mapping hurts performance. A lot of applications, for example, pthread > > lib, allocate stack with MAP_STACK and they don't call MADV_NOHUGEPAGE > > on stack mapping. > > > > It makes sense to have a stack with NOHUGEPAGE, but does anyone use > MAP_STACK to avoid the extra syscall to madv to set it on mappings that > are NOT stacks which would now become stack-like with this change? AFAICT, I'm not aware of such usecase. It is definitely not recommended and misuse of MAP_STACK. I don't see how we can prevent this in kernel other than document it properly. > > ... > > > >>>> _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | > > > >>>> arch_calc_vm_flag_bits(flags); > > > >>>> } > > >