Hi Quentin, On Tue, Mar 02, 2021 at 02:59:45PM +0000, Quentin Perret wrote: > When memory protection is enabled, the EL2 code needs the ability to > create and manage its own page-table. To do so, introduce a new set of > hypercalls to bootstrap a memory management system at EL2. > > This leads to the following boot flow in nVHE Protected mode: > > 1. the host allocates memory for the hypervisor very early on, using > the memblock API; > > 2. the host creates a set of stage 1 page-table for EL2, installs the > EL2 vectors, and issues the __pkvm_init hypercall; > > 3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table > and stores it in the memory pool provided by the host; > > 4. the hypervisor then extends its stage 1 mappings to include a > vmemmap in the EL2 VA space, hence allowing to use the buddy > allocator introduced in a previous patch; > > 5. the hypervisor jumps back in the idmap page, switches from the > host-provided page-table to the new one, and wraps up its > initialization by enabling the new allocator, before returning to > the host. > > 6. the host can free the now unused page-table created for EL2, and > will now need to issue hypercalls to make changes to the EL2 stage 1 > mappings instead of modifying them directly. > > Note that for the sake of simplifying the review, this patch focuses on > the hypervisor side of things. In other words, this only implements the > new hypercalls, but does not make use of them from the host yet. The > host-side changes will follow in a subsequent patch. > > Credits to Will for __pkvm_init_switch_pgd. > > Co-authored-by: Will Deacon <will@xxxxxxxxxx> > Signed-off-by: Will Deacon <will@xxxxxxxxxx> > Signed-off-by: Quentin Perret <qperret@xxxxxxxxxx> > --- > arch/arm64/include/asm/kvm_asm.h | 4 + > arch/arm64/include/asm/kvm_host.h | 7 + > arch/arm64/include/asm/kvm_hyp.h | 8 ++ > arch/arm64/include/asm/kvm_pgtable.h | 2 + > arch/arm64/kernel/image-vars.h | 16 +++ > arch/arm64/kvm/hyp/Makefile | 2 +- > arch/arm64/kvm/hyp/include/nvhe/mm.h | 71 ++++++++++ > arch/arm64/kvm/hyp/nvhe/Makefile | 4 +- > arch/arm64/kvm/hyp/nvhe/hyp-init.S | 31 +++++ > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 49 +++++++ > arch/arm64/kvm/hyp/nvhe/mm.c | 173 ++++++++++++++++++++++++ > arch/arm64/kvm/hyp/nvhe/setup.c | 195 +++++++++++++++++++++++++++ > arch/arm64/kvm/hyp/pgtable.c | 2 - > arch/arm64/kvm/hyp/reserved_mem.c | 92 +++++++++++++ > arch/arm64/mm/init.c | 3 + > 15 files changed, 654 insertions(+), 5 deletions(-) This mostly looks good to me, but in a patch this size I was bound to spot a few niggles. It is _huge_! > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S b/arch/arm64/kvm/hyp/nvhe/hyp-init.S > index c631e29fb001..bc56ea92b812 100644 > --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S > @@ -244,4 +244,35 @@ alternative_else_nop_endif > > SYM_CODE_END(__kvm_handle_stub_hvc) > > +SYM_FUNC_START(__pkvm_init_switch_pgd) > + /* Turn the MMU off */ > + pre_disable_mmu_workaround > + mrs x2, sctlr_el2 > + bic x3, x2, #SCTLR_ELx_M > + msr sctlr_el2, x3 > + isb > + > + tlbi alle2 > + > + /* Install the new pgtables */ > + ldr x3, [x0, #NVHE_INIT_PGD_PA] > + phys_to_ttbr x4, x3 > +alternative_if ARM64_HAS_CNP > + orr x4, x4, #TTBR_CNP_BIT > +alternative_else_nop_endif > + msr ttbr0_el2, x4 > + > + /* Set the new stack pointer */ > + ldr x0, [x0, #NVHE_INIT_STACK_HYP_VA] > + mov sp, x0 > + > + /* And turn the MMU back on! */ > + dsb nsh > + isb > + msr sctlr_el2, x2 > + ic iallu > + isb Comparing with the new-fangled set_sctlr_el1 macro we have, this sequence isn't quite right. Probably best to introduce set_sctlr_el2, and implement that and the existing macro in terms of set_sctlr_elX or something like that. > +void __noreturn __pkvm_init_finalise(void) > +{ > + struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data); > + struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt; > + unsigned long nr_pages, reserved_pages, pfn; > + int ret; > + > + /* Now that the vmemmap is backed, install the full-fledged allocator */ > + pfn = hyp_virt_to_pfn(hyp_pgt_base); > + nr_pages = hyp_s1_pgtable_pages(); > + reserved_pages = hyp_early_alloc_nr_used_pages(); > + ret = hyp_pool_init(&hpool, pfn, nr_pages, reserved_pages); > + if (ret) > + goto out; > + > + pkvm_pgtable_mm_ops.zalloc_page = hyp_zalloc_hyp_page; > + pkvm_pgtable_mm_ops.phys_to_virt = hyp_phys_to_virt; > + pkvm_pgtable_mm_ops.virt_to_phys = hyp_virt_to_phys; > + pkvm_pgtable_mm_ops.get_page = hyp_get_page; > + pkvm_pgtable_mm_ops.put_page = hyp_put_page; > + pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops; Can you do: pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) { .zalloc_page = hyp_zalloc_hyp_page, .phys_to_virt = ..., ... }; here? > + > +out: > + /* > + * We tail-called to here from handle___pkvm_init() and will not return, > + * so make sure to propagate the return value to the host. > + */ > + cpu_reg(host_ctxt, 1) = ret; > + > + __host_enter(host_ctxt); > +} > + > +int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus, > + unsigned long *per_cpu_base, u32 hyp_va_bits) > +{ > + struct kvm_nvhe_init_params *params; > + void *virt = hyp_phys_to_virt(phys); > + void (*fn)(phys_addr_t params_pa, void *finalize_fn_va); > + int ret; > + > + if (phys % PAGE_SIZE || size % PAGE_SIZE) > + return -EINVAL; Either PAGE_ALIGNED or '& ~PAGE_MASK' would be better than spelling this with '%', I reckon. Anyway, other than these nits: Acked-by: Will Deacon <will@xxxxxxxxxx> Will _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm