On Wed, Aug 28, 2024 at 01:59:18PM -0700, Charlie Jenkins wrote: > On Wed, Aug 28, 2024 at 02:31:42PM -0400, Liam R. Howlett wrote: > > * Charlie Jenkins <charlie@xxxxxxxxxxxx> [240828 01:49]: > > > Some applications rely on placing data in free bits addresses allocated > > > by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the > > > address returned by mmap to be less than the maximum address space, > > > unless the hint address is greater than this value. > > > > Wait, what arch(s) allows for greater than the max? The passed hint > > should be where we start searching, but we go to the lower limit then > > start at the hint and search up (or vice-versa on the directions). > > > > I worded this awkwardly. On arm64 there is a page-table boundary at 48 > bits and at 52 bits. On x86 the boundaries are at 48 bits and 57 bits. > The max value mmap is able to return on arm64 is 48 bits if the hint > address uses 48 bits or less, even if the architecture supports 5-level > paging and thus addresses can be 52 bits. Applications can opt-in to > using up to 52-bits in an address by using a hint address greater than > 48 bits. x86 has the same behavior but with 57 bits instead of 52. > > This reason this exists is because some applications arbitrarily replace > bits in virtual addresses with data with an assumption that the address > will not be using any of the bits above bit 48 in the virtual address. > As hardware with larger address spaces was released, x86 decided to > build safety guards into the kernel to allow the applications that made > these assumptions to continue to work on this different hardware. > > This causes all application that use a hint address to silently be > restricted to 48-bit addresses. The goal of this flag is to have a way > for applications to explicitly request how many bits they want mmap to > use. > > > I don't understand how unmapping works on a higher address; we would > > fail to free it on termination of the application. > > > > Also, there are archs that map outside of the VMAs, which are freed by > > freeing from the prev->vm_end to next->vm_start, so I don't understand > > what that looks like in this reality as well. > > > > > > > > On arm64 this barrier is at 52 bits and on x86 it is at 56 bits. This > > > flag allows applications a way to specify exactly how many bits they > > > want to be left unused by mmap. This eliminates the need for > > > applications to know the page table hierarchy of the system to be able > > > to reason which addresses mmap will be allowed to return. > > > > But, why do they need to know today? We have a limit for this don't we? > > The limit is different for different architectures. On x86 the limit is > 57 bits, and on arm64 it is 52 bits. So in the theoretical case that an > application requires 10 bits free in a virtual address, the application > would always work on arm64 regardless of the hint address, but on x86 if > the hint address is greater than 48 bits then the application will not > work. > > The goal of this flag is to have consistent and tunable behavior of > mmap() when it is desired to ensure that mmap() only returns addresses > that use some number of bits. > > > > > Also, these upper limits are how some archs use the upper bits that you > > are trying to use. > > > > It does not eliminate the existing behavior of the architectures to > place this upper limits, it instead provides a way to have consistent > behavior across all architectures. > > > > > > > --- > > > riscv made this feature of mmap returning addresses less than the hint > > > address the default behavior. This was in contrast to the implementation > > > of x86/arm64 that have a single boundary at the 5-level page table > > > region. However this restriction proved too great -- the reduced > > > address space when using a hint address was too small. > > > > Yes, the hint is used to group things close together so it would > > literally be random chance on if you have enough room or not (aslr and > > all). > > > > > > > > A patch for riscv [1] reverts the behavior that broke userspace. This > > > series serves to make this feature available to all architectures. > > > > I don't fully understand this statement, you say it broke userspace so > > now you are porting it to everyone? This reads as if you are braking > > the userspace on all architectures :) > > It was the default for mmap on riscv. The difference here is that it is now > enabled by a flag instead. Instead of making the flag specific to riscv, > I figured that other architectures might find it useful as well. > > > > > If you fail to find room below, then your application fails as there is > > no way to get the upper bits you need. It would be better to fix this > > in userspace - if your application is returned too high an address, then > > free it and exit because it's going to fail anyways. > > > > This flag is trying to define an API that is more robust than the > current behavior on that x86 and arm64 which implicitly restricts mmap() > addresses to 48 bits. A solution could be to just write in the docs that > mmap() will always exhaust all addresses below the hint address before > returning an address that is above the hint address. However a flag that > defines this behavior seems more intuitive. > > > > > > > I have only tested on riscv and x86. > > > > This should be an RFC then. > > Fair enough. > > > > > > There is a tremendous amount of > > > duplicated code in mmap so the implementations across architectures I > > > believe should be mostly consistent. I added this feature to all > > > architectures that implement either > > > arch_get_mmap_end()/arch_get_mmap_base() or > > > arch_get_unmapped_area_topdown()/arch_get_unmapped_area(). I also added > > > it to the default behavior for arch_get_mmap_end()/arch_get_mmap_base(). > > > > Way too much duplicate code. We should be figuring out how to make this > > all work with the same code. > > > > This is going to make the cloned code problem worse. > > That would require standardizing every architecture with the generic > mmap() framework that arm64 has developed. That is far outside the scope > of this patch, but would be a great area to research for each of the > architectures that do not use the generic framework. Thinking about this again, I could drop support for all architectures that do not implement arch_get_mmap_base()/arch_get_mmap_end(). > > - Charlie > > > > > > > > > Link: https://lore.kernel.org/lkml/20240826-riscv_mmap-v1-2-cd8962afe47f@xxxxxxxxxxxx/T/ [1] > > > > > > To: Arnd Bergmann <arnd@xxxxxxxx> > > > To: Paul Walmsley <paul.walmsley@xxxxxxxxxx> > > > To: Palmer Dabbelt <palmer@xxxxxxxxxxx> > > > To: Albert Ou <aou@xxxxxxxxxxxxxxxxx> > > > To: Catalin Marinas <catalin.marinas@xxxxxxx> > > > To: Will Deacon <will@xxxxxxxxxx> > > > To: Michael Ellerman <mpe@xxxxxxxxxxxxxx> > > > To: Nicholas Piggin <npiggin@xxxxxxxxx> > > > To: Christophe Leroy <christophe.leroy@xxxxxxxxxx> > > > To: Naveen N Rao <naveen@xxxxxxxxxx> > > > To: Muchun Song <muchun.song@xxxxxxxxx> > > > To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > To: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> > > > To: Vlastimil Babka <vbabka@xxxxxxx> > > > To: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> > > > To: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > > To: Ingo Molnar <mingo@xxxxxxxxxx> > > > To: Borislav Petkov <bp@xxxxxxxxx> > > > To: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> > > > To: x86@xxxxxxxxxx > > > To: H. Peter Anvin <hpa@xxxxxxxxx> > > > To: Huacai Chen <chenhuacai@xxxxxxxxxx> > > > To: WANG Xuerui <kernel@xxxxxxxxxx> > > > To: Russell King <linux@xxxxxxxxxxxxxxx> > > > To: Thomas Bogendoerfer <tsbogend@xxxxxxxxxxxxxxxx> > > > To: James E.J. Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> > > > To: Helge Deller <deller@xxxxxx> > > > To: Alexander Gordeev <agordeev@xxxxxxxxxxxxx> > > > To: Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx> > > > To: Heiko Carstens <hca@xxxxxxxxxxxxx> > > > To: Vasily Gorbik <gor@xxxxxxxxxxxxx> > > > To: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx> > > > To: Sven Schnelle <svens@xxxxxxxxxxxxx> > > > To: Yoshinori Sato <ysato@xxxxxxxxxxxxxxxxxxxx> > > > To: Rich Felker <dalias@xxxxxxxx> > > > To: John Paul Adrian Glaubitz <glaubitz@xxxxxxxxxxxxxxxxxxx> > > > To: David S. Miller <davem@xxxxxxxxxxxxx> > > > To: Andreas Larsson <andreas@xxxxxxxxxxx> > > > To: Shuah Khan <shuah@xxxxxxxxxx> > > > To: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> > > > Cc: linux-arch@xxxxxxxxxxxxxxx > > > Cc: linux-kernel@xxxxxxxxxxxxxxx > > > Cc: Palmer Dabbelt <palmer@xxxxxxxxxxxx> > > > Cc: linux-riscv@xxxxxxxxxxxxxxxxxxx > > > Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > > > Cc: linuxppc-dev@xxxxxxxxxxxxxxxx > > > Cc: linux-mm@xxxxxxxxx > > > Cc: loongarch@xxxxxxxxxxxxxxx > > > Cc: linux-mips@xxxxxxxxxxxxxxx > > > Cc: linux-parisc@xxxxxxxxxxxxxxx > > > Cc: linux-s390@xxxxxxxxxxxxxxx > > > Cc: linux-sh@xxxxxxxxxxxxxxx > > > Cc: sparclinux@xxxxxxxxxxxxxxx > > > Cc: linux-kselftest@xxxxxxxxxxxxxxx > > > Signed-off-by: Charlie Jenkins <charlie@xxxxxxxxxxxx> > > > > > > --- > > > Charlie Jenkins (16): > > > mm: Add MAP_BELOW_HINT > > > riscv: mm: Do not restrict mmap address based on hint > > > mm: Add flag and len param to arch_get_mmap_base() > > > mm: Add generic MAP_BELOW_HINT > > > riscv: mm: Support MAP_BELOW_HINT > > > arm64: mm: Support MAP_BELOW_HINT > > > powerpc: mm: Support MAP_BELOW_HINT > > > x86: mm: Support MAP_BELOW_HINT > > > loongarch: mm: Support MAP_BELOW_HINT > > > arm: mm: Support MAP_BELOW_HINT > > > mips: mm: Support MAP_BELOW_HINT > > > parisc: mm: Support MAP_BELOW_HINT > > > s390: mm: Support MAP_BELOW_HINT > > > sh: mm: Support MAP_BELOW_HINT > > > sparc: mm: Support MAP_BELOW_HINT > > > selftests/mm: Create MAP_BELOW_HINT test > > > > > > arch/arm/mm/mmap.c | 10 ++++++++ > > > arch/arm64/include/asm/processor.h | 34 ++++++++++++++++++++++---- > > > arch/loongarch/mm/mmap.c | 11 +++++++++ > > > arch/mips/mm/mmap.c | 9 +++++++ > > > arch/parisc/include/uapi/asm/mman.h | 1 + > > > arch/parisc/kernel/sys_parisc.c | 9 +++++++ > > > arch/powerpc/include/asm/task_size_64.h | 36 +++++++++++++++++++++++----- > > > arch/riscv/include/asm/processor.h | 32 ------------------------- > > > arch/s390/mm/mmap.c | 10 ++++++++ > > > arch/sh/mm/mmap.c | 10 ++++++++ > > > arch/sparc/kernel/sys_sparc_64.c | 8 +++++++ > > > arch/x86/kernel/sys_x86_64.c | 25 ++++++++++++++++--- > > > fs/hugetlbfs/inode.c | 2 +- > > > include/linux/sched/mm.h | 34 ++++++++++++++++++++++++-- > > > include/uapi/asm-generic/mman-common.h | 1 + > > > mm/mmap.c | 2 +- > > > tools/arch/parisc/include/uapi/asm/mman.h | 1 + > > > tools/include/uapi/asm-generic/mman-common.h | 1 + > > > tools/testing/selftests/mm/Makefile | 1 + > > > tools/testing/selftests/mm/map_below_hint.c | 29 ++++++++++++++++++++++ > > > 20 files changed, 216 insertions(+), 50 deletions(-) > > > --- > > > base-commit: 5be63fc19fcaa4c236b307420483578a56986a37 > > > change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55 > > > -- > > > - Charlie > > >