On Mon, 2024-08-19 at 10:08 +0300, Kirill A. Shutemov wrote: > The init_transition_pgtable() function sets up transitional page tables. > It ensures that the relocate_kernel() function is present in the > identity mapping at the same location as in the kernel page tables. > relocate_kernel() switches to the identity mapping, and the function > must be present at the same location in the virtual address space before > and after switching page tables. > > init_transition_pgtable() maps a copy of relocate_kernel() in > image->control_code_page at the relocate_kernel() virtual address, but > the original physical address of relocate_kernel() would also work. > > It is safe to use original relocate_kernel() physical address cannot be > overwritten until swap_pages() is called, and the relocate_kernel() > virtual address will not be used by then. > > Map the original relocate_kernel() at the relocate_kernel() virtual > address in the identity mapping. It is preparation to replace the > init_transition_pgtable() implementation with a call to > kernel_ident_mapping_init(). > > Note that while relocate_kernel() switches to the identity mapping, it > does not flush global TLB entries (CR4.PGE is not cleared). This means > that in most cases, the kernel still runs relocate_kernel() from the > original physical address before the change. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > --- > arch/x86/kernel/machine_kexec_64.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c > index 9c9ac606893e..645690e81c2d 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -157,7 +157,7 @@ static int init_transition_pgtable(struct kimage *image, pgd_t *pgd) > pte_t *pte; > > vaddr = (unsigned long)relocate_kernel; > - paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE); > + paddr = __pa(relocate_kernel); > pgd += pgd_index(vaddr); > if (!pgd_present(*pgd)) { > p4d = (p4d_t *)get_zeroed_page(GFP_KERNEL); IIUC, this breaks KEXEC_JUMP (image->preserve_context is true). The relocate_kernel() first saves couple of regs and some other data like PA of swap page to the control page. Note here the VA_CONTROL_PAGE is used to access the control page, so those data are saved to the control page. SYM_CODE_START_NOALIGN(relocate_kernel) UNWIND_HINT_END_OF_STACK ANNOTATE_NOENDBR /* * %rdi indirection_page * %rsi page_list * %rdx start address * %rcx preserve_context * %r8 bare_metal */ ... movq PTR(VA_CONTROL_PAGE)(%rsi), %r11 movq %rsp, RSP(%r11) movq %cr0, %rax movq %rax, CR0(%r11) movq %cr3, %rax movq %rax, CR3(%r11) movq %cr4, %rax movq %rax, CR4(%r11) ... /* * get physical address of control page now * this is impossible after page table switch */ movq PTR(PA_CONTROL_PAGE)(%rsi), %r8 /* get physical address of page table now too */ movq PTR(PA_TABLE_PAGE)(%rsi), %r9 /* get physical address of swap page now */ movq PTR(PA_SWAP_PAGE)(%rsi), %r10 /* save some information for jumping back */ movq %r9, CP_PA_TABLE_PAGE(%r11) movq %r10, CP_PA_SWAP_PAGE(%r11) movq %rdi, CP_PA_BACKUP_PAGES_MAP(%r11) ... And after jumping back from the second kernel, relocate_kernel() tries to restore the saved data: ... /* get the re-entry point of the peer system */ movq 0(%rsp), %rbp leaq relocate_kernel(%rip), %r8 <--------- (*) movq CP_PA_SWAP_PAGE(%r8), %r10 movq CP_PA_BACKUP_PAGES_MAP(%r8), %rdi movq CP_PA_TABLE_PAGE(%r8), %rax movq %rax, %cr3 lea PAGE_SIZE(%r8), %rsp call swap_pages movq $virtual_mapped, %rax pushq %rax ANNOTATE_UNRET_SAFE ret int3 SYM_CODE_END(identity_mapped) Note the above code (*) uses the VA of relocate_kernel() to access the control page. IIUC, that means if we map VA of relocate_kernel() to the original PA where the code relocate_kernel() resides, then the above code will never be able to read those data back since they were saved to the control page. Did I miss anything?