On 10/29/24 15:26, Luck, Tony wrote: >> /* >> * This needs to follow the FPU initializtion, since EFI depends on it. >> + * It also needs to precede the CR pinning setup, because we need to be >> + * able to temporarily clear the CR4.LASS bit in order to execute the >> + * set_virtual_address_map call, which resides in lower addresses and >> + * would trip LASS if enabled. >> */ > > Why are the temporary mappings used to patch kernel code in the lower half > of the virtual address space? I was just asking myself the same thing. The upper half is always mapped uniformly. When you create an MM you copy the 256->511th pgd entries verbatim from the init_mm's pgd. If you map something the <=255th pgd entry, it isn't (by default) visible to other mm's. That's why a new mm also tends to get you a new process. > But couldn't we map into upper half and do some/all of: > > 1) Trust that there aren't stupid bugs that dereference random pointers into the > temporary mapping? > 2) Make a "this CPU only" mapping > 3) Avoid preemption while patching so there is no need for TLB shootdown > by other CPUs when the temporary mapping is torn down, just flush local TLB. It's about enforcing R^X semantics. We should limit the time and scope where mappings have some data both writeable and executable. If we poke text in the upper half of the address space, any kernel thread might be exploited to write to what will soon be executable. If we do it in the lower half in its own mm, you have to compromise the thread doing the text poking after the mapping is created but before it is invalidated. With LASS you *ALSO* need to do it in the STAC/CLAC window which is smaller than the window when the TLB is valid. *IF* we switched things to do text poking in the upper half of the address space, we'd probably want to find a completely unused PGD entry. I'm not sure off the top of my head if we have a good one for that or if it's worth the trouble.