On Mon, Feb 27, 2023 at 02:29:41PM -0800, Rick Edgecombe wrote: > The x86 Control-flow Enforcement Technology (CET) feature includes a new > type of memory called shadow stack. This shadow stack memory has some > unusual properties, which require some core mm changes to function > properly. > > One of the properties is that the shadow stack pointer (SSP), which is a > CPU register that points to the shadow stack like the stack pointer points > to the stack, can't be pointing outside of the 32 bit address space when > the CPU is executing in 32 bit mode. It is desirable to prevent executing > in 32 bit mode when shadow stack is enabled because the kernel can't easily > support 32 bit signals. > > On x86 it is possible to transition to 32 bit mode without any special > interaction with the kernel, by doing a "far call" to a 32 bit segment. > So the shadow stack implementation can use this address space behavior > as a feature, by enforcing that shadow stack memory is always crated ^^^^^^^ "created" and I'd say "mapped" or "allocated" here. "Created" sounds weird. > outside of the 32 bit address space. This way userspace will trigger a > general protection fault which will in turn trigger a segfault if it > tries to transition to 32 bit mode with shadow stack enabled. > > This provides a clean error generating border for the user if they try > attempt to do 32 bit mode shadow stack, rather than leave the kernel in a > half working state for userspace to be surprised by. > > So to allow future shadow stack enabling patches to map shadow stacks > out of the 32 bit address space, introduce MAP_ABOVE4G. The behavior I guess this needs to be documented in the mmap() manpage too. > is pretty much like MAP_32BIT, except that it has the opposite address > range. The are a few differences though. > > If both MAP_32BIT and MAP_ABOVE4G are provided, the kernel will use the > MAP_ABOVE4G behavior. Like MAP_32BIT, MAP_ABOVE4G is ignored in a 32 bit > syscall. > > Since the default search behavior is top down, the normal kaslr base can > be used for MAP_ABOVE4G. This is unlike MAP_32BIT which has to add it's ^^^^ "its" > own randomization in the bottom up case. ... > diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c > index 8cc653ffdccd..06378b5682c1 100644 > --- a/arch/x86/kernel/sys_x86_64.c > +++ b/arch/x86/kernel/sys_x86_64.c > @@ -193,7 +193,11 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, > > info.flags = VM_UNMAPPED_AREA_TOPDOWN; > info.length = len; > - info.low_limit = PAGE_SIZE; > + if (!in_32bit_syscall() && (flags & MAP_ABOVE4G)) > + info.low_limit = 0x100000000; We have a human readable define for that: SZ_4G > + else > + info.low_limit = PAGE_SIZE; > + > info.high_limit = get_mmap_base(0); > > /* -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette