* Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> wrote: > This patch addresses shortcoming in current boot process on machines > that supports 5-level paging. > > If bootloader enables 64-bit mode with 4-level paging, we need to > switch over to 5-level paging. The switching requires disabling paging. > It works fine if kernel itself is loaded below 4G. > > If bootloader put the kernel above 4G (not sure if anybody does this), > we would loose control as soon as paging is disabled as code becomes > unreachable. > > This patch implements trampoline in lower memory to handle this > situation. > > Apart from trampoline itself we also need place to store top level page > table in lower memory as we don't have a way to load 64-bit value into > CR3 from 32-bit mode. We only really need 8-bytes there as we only use > the very first entry of the page table. > > place_trampoline() would choose an address for the trampoline page. > The implementation is based on reserve_bios_regions(). We take a page > next to end of lowmem. > > We only need the page for very short time, until main kernel image > setup its own page tables. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > --- > arch/x86/boot/compressed/head_64.S | 87 ++++++++++++++++++++++++++------------ > arch/x86/boot/compressed/misc.c | 25 +++++++++++ > 2 files changed, 84 insertions(+), 28 deletions(-) > > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S > index cefe4958fda9..961c72755986 100644 > --- a/arch/x86/boot/compressed/head_64.S > +++ b/arch/x86/boot/compressed/head_64.S > @@ -288,8 +288,23 @@ ENTRY(startup_64) > leaq boot_stack_end(%rbx), %rsp > > #ifdef CONFIG_X86_5LEVEL > +/* > + * We need trampoline in lower memory switch from 4- to 5-level paging for > + * cases when bootloader put kernel above 4G, but didn't enable 5-level paging > + * for us. > + * > + * We also have to have top page table in lower memory as we don't have a way > + * to load 64-bit value into CR3 from 32-bit mode. We only need 8-bytes there > + * as we only use the very first entry of the page table. > + * > + * The same page can be used to place both trampoline code and top level page > + * table. place_trampoline() will find suitable place for the trampoline page. > + * Code will be placed with offset 0x100 from beginning of the page. > + */ > +#define LVL5_TRAMPOLINE_CODE 0x100 > + > /* Preserve RBX across CPUID */ > - movq %rbx, %r8 > + movq %rbx, %r15 > > /* Check if leaf 7 is supported */ > xorl %eax, %eax > @@ -307,9 +322,6 @@ ENTRY(startup_64) > andl $(1 << 16), %ecx > jz lvl5 > > - /* Restore RBX */ > - movq %r8, %rbx > - > /* Check if 5-level paging has already been enabled */ > movq %cr4, %rax > testl $X86_CR4_LA57, %eax > @@ -323,34 +335,53 @@ ENTRY(startup_64) > * long mode would trigger #GP. So we need to switch off long mode > * first. > * > - * NOTE: This is not going to work if bootloader put us above 4G > - * limit. > + * We use trampoline in lower memory to handle situation when > + * bootloader put the kernel image above 4G. > * > * The first step is go into compatibility mode. > */ > > - /* Clear additional page table */ > - leaq lvl5_pgtable(%rbx), %rdi > - xorq %rax, %rax > - movq $(PAGE_SIZE/8), %rcx > - rep stosq > + /* > + * Find sitable place for trampoline. > + * The address will be stored in RBX. > + */ > + call place_trampoline > + movq %rax, %rbx > + > + /* Preserve RSI, to be used by movsb below */ > + movq %rsi, %r14 > + > + /* Copy trampoline code in place */ > + leaq lvl5_trampoline_src(%rip), %rsi > + leaq LVL5_TRAMPOLINE_CODE(%rbx), %rdi > + movq $(lvl5_trampoline_end - lvl5_trampoline_src), %rcx > + rep movsb > + > + /* Restore RSI */ > + movq %r14, %rsi Yeah, so first most of this code should be moved from assembly to C. Any reason why that cannot be done? Cleanups like that are a precondition to adding this patch or other 5-level paging complications like the dynamic boot time switching. Thanks, Ingo -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>