Hi James, On Tue, 05 May 2020 17:03:15 +0100, James Morse <james.morse@xxxxxxx> wrote: > > Hi Marc, > > On 22/04/2020 13:00, Marc Zyngier wrote: > > From: Christoffer Dall <christoffer.dall@xxxxxxx> > > > > As we are about to reuse our stage 2 page table manipulation code for > > shadow stage 2 page tables in the context of nested virtualization, we > > are going to manage multiple stage 2 page tables for a single VM. > > > > This requires some pretty invasive changes to our data structures, > > which moves the vmid and pgd pointers into a separate structure and > > change pretty much all of our mmu code to operate on this structure > > instead. > > > > The new structure is called struct kvm_s2_mmu. > > > > There is no intended functional change by this patch alone. > > It's not obvious to me that VTCR_EL2.T0SZ is a per-vm thing, today > the size of the IPA space comes from the VMM, its not a > hardware/compile-time property. Where does the vEL2's T0SZ go? > ... but using this for nested guests would 'only' cause a > translation fault, it would still need handling in the emulation > code. So making it per-vm should be simpler. My reasoning is that this VTCR defines the virtual HW, and the guest's own VTCR_EL2 is just another guest system register. It is the role of the NV code to compose the two in a way that makes sense (delivering translation faults if the NV guest's S2 output range doesn't fit in the host's view of the VM IPA range). > But accessing VTCR is why the stage2_dissolve_p?d() stuff still > needs the kvm pointer, hence the backreference... it might be neater > to push the vtcr properties into kvm_s2_mmu that way you could drop > the kvm backref, and only things that take vm-wide locks would need > the kvm pointer. But I don't think it matters. That's an interesting consideration. I'll have a look. > I think I get it. I can't see anything that should be the other > vm/vcpu pointer. > > Reviewed-by: James Morse <james.morse@xxxxxxx> Thanks! > Some boring fiddly stuff: > > [...] > > > @@ -125,24 +123,24 @@ static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm, > > } > > } > > > > -static void __hyp_text __tlb_switch_to_host(struct kvm *kvm, > > +static void __hyp_text __tlb_switch_to_host(struct kvm_s2_mmu *mmu, > > struct tlb_inv_context *cxt) > > { > > if (has_vhe()) > > - __tlb_switch_to_host_vhe(kvm, cxt); > > + __tlb_switch_to_host_vhe(cxt); > > else > > - __tlb_switch_to_host_nvhe(kvm, cxt); > > + __tlb_switch_to_host_nvhe(cxt); > > } > > What does __tlb_switch_to_host() need the kvm_s2_mmu for? Not much. Obviously mechanical conversion of kvm->kvm_s2_mmu, and not finishing the job. I'll fix that. > > [...] > > > > void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu) > > { > > - struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm); > > + struct kvm_s2_mmu *mmu = kern_hyp_va(kern_hyp_va(vcpu)->arch.hw_mmu); > > struct tlb_inv_context cxt; > > > > /* Switch to requested VMID */ > > - __tlb_switch_to_guest(kvm, &cxt); > > + __tlb_switch_to_guest(mmu, &cxt); > > > > __tlbi(vmalle1); > > dsb(nsh); > > isb(); > > > > - __tlb_switch_to_host(kvm, &cxt); > > + __tlb_switch_to_host(mmu, &cxt); > > } > > Does this need the vcpu in the future? > Its the odd one out, the other tlb functions here take the s2_mmu, or nothing. > We only use the s2_mmu here. I think this was done as a way not impact the 32bit code (rest in peace). Definitely a candidate for immediate cleanup. > > [...] > > > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > > index e3b9ee268823b..2f99749048285 100644 > > --- a/virt/kvm/arm/mmu.c > > +++ b/virt/kvm/arm/mmu.c > > > @@ -96,31 +96,33 @@ static bool kvm_is_device_pfn(unsigned long pfn) > > * > > * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs. > > */ > > -static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) > > +static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd) > > The comment above this function still has '@kvm: pointer to kvm structure.' > > [...] > > > > @@ -331,8 +339,9 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, > > * destroying the VM), otherwise another faulting VCPU may come in and mess > > * with things behind our backs. > > */ > > -static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) > > +static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size) > > The comment above this function still has '@kvm: The VM pointer' > > [...] > > > -static void stage2_flush_memslot(struct kvm *kvm, > > +static void stage2_flush_memslot(struct kvm_s2_mmu *mmu, > > struct kvm_memory_slot *memslot) > > { > > Wouldn't something manipulating a memslot have to mess with a set of > kvm_s2_mmu once this is all assembled? stage2_unmap_memslot() takes > struct kvm, it seems odd to pass one kvm_s2_mmu here. Indeed, that doesn't make sense. I guess this was done to match kvm_stage2_flush_range, which does need to take a kvm_s2_mmu (it is directly called from the nesting code). stage2_flush_memslot() only affects the "main" S2 PTs, so passing kvm here should be the right thing to do. > > [...] > > > @@ -886,21 +898,23 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size, > > > -int kvm_alloc_stage2_pgd(struct kvm *kvm) > > +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu) > > { > > phys_addr_t pgd_phys; > > pgd_t *pgd; > > + int cpu; > > > > - if (kvm->arch.pgd != NULL) { > > + if (mmu->pgd != NULL) { > > kvm_err("kvm_arch already initialized?\n"); > > Does this error message still make sense? Probably not anymore. I'll revisit that shortly. > > > > return -EINVAL; > > } > > [...] > > > @@ -1439,9 +1467,10 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end) > > * @addr: range start address > > * @end: range end address > > */ > > -static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, > > +static void stage2_wp_pmds(struct kvm_s2_mmu *mmu, pud_t *pud, > > phys_addr_t addr, phys_addr_t end) > > The comment above this function still has 'kvm: kvm instance for the VM'. > > > > { > > + struct kvm *kvm = mmu->kvm; > > pmd_t *pmd; > > phys_addr_t next; > > > > @@ -1461,14 +1490,15 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, > > } > > > > /** > > - * stage2_wp_puds - write protect PGD range > > - * @pgd: pointer to pgd entry > > - * @addr: range start address > > - * @end: range end address > > - */ > > -static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, > > + * stage2_wp_puds - write protect PGD range > > + * @pgd: pointer to pgd entry > > + * @addr: range start address > > + * @end: range end address > > + */ > > +static void stage2_wp_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd, > > phys_addr_t addr, phys_addr_t end) > > Comment was missing @kvm, now its missing @mmu.... > > > > { > > + struct kvm *kvm __maybe_unused = mmu->kvm; > > pud_t *pud; > > phys_addr_t next; > > > > > @@ -1492,12 +1522,13 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, > > * @addr: Start address of range > > * @end: End address of range > > */ > > -static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) > > +static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end) > > The comment above this function still ... you get the picture. Thanks a lot for the careful review. I'll respin this shortly, as this is one of the patch I'd like to get in early. M. -- Jazz is not dead, it just smells funny.