On Wed, 2024-11-27 at 19:00 +0000, David Woodhouse wrote: > From: David Woodhouse <dwmw@xxxxxxxxxxxx> > > The set_p4d() and set_pgd() functions (in 4-level or 5-level page table setups > respectively) assume that the root page table is actually a 8KiB allocation, > with the userspace root immediately after the kernel root page table (so that > the former can enforce NX on on all the subordinate pages, which are actually > shared). > > However, users of the kernel_ident_mapping_init() code do not give it an 8KiB > allocation for its PGD. Both swsusp_arch_resume() and acpi_mp_setup_reset() > allocate only a single 4KiB page. The kexec code on x86_64 currently gets > away with it purely by chance, because it allocates 8KiB for its "control > code page" and then actually uses the first half for the PGD, then copies the > actual trampoline code into the second half only after the identmap code has > finished scribbling over it. > > Fix this by defining a _PAGE_NOPTISHADOW bit (which can use the same bit as > _PAGE_SAVED_DIRTY since one is only for the PGD/P4D root and the other is > exclusively for leaf PTEs.). This instructs __pti_set_user_pgtbl() not to > write to the userspace 'shadow' PGD. > > Strictly, the _PAGE_NOPTISHADOW bit doesn't need to be written out to the > actual page tables; since __pti_set_user_pgtbl() returns the value to be > written to the kernel page table, it could be filtered out. But there seems > to be no benefit to actually doing so. Ping? I think the rest of the kexec-debug series is in fairly good shape; this is the only part I'm slightly unsure about.
<<attachment: smime.p7s>>