Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm

Marc Zyngier <maz@xxxxxxxxxx> · Tue, 05 May 2020 18:59:56 +0100

Hi James,

On Tue, 05 May 2020 17:03:15 +0100,
James Morse <james.morse@xxxxxxx> wrote:
> 
> Hi Marc,
> 
> On 22/04/2020 13:00, Marc Zyngier wrote:
> > From: Christoffer Dall <christoffer.dall@xxxxxxx>
> > 
> > As we are about to reuse our stage 2 page table manipulation code for
> > shadow stage 2 page tables in the context of nested virtualization, we
> > are going to manage multiple stage 2 page tables for a single VM.
> > 
> > This requires some pretty invasive changes to our data structures,
> > which moves the vmid and pgd pointers into a separate structure and
> > change pretty much all of our mmu code to operate on this structure
> > instead.
> > 
> > The new structure is called struct kvm_s2_mmu.
> > 
> > There is no intended functional change by this patch alone.
> 
> It's not obvious to me that VTCR_EL2.T0SZ is a per-vm thing, today
> the size of the IPA space comes from the VMM, its not a
> hardware/compile-time property. Where does the vEL2's T0SZ go?
> ... but using this for nested guests would 'only' cause a
> translation fault, it would still need handling in the emulation
> code. So making it per-vm should be simpler.

My reasoning is that this VTCR defines the virtual HW, and the guest's
own VTCR_EL2 is just another guest system register. It is the role of
the NV code to compose the two in a way that makes sense (delivering
translation faults if the NV guest's S2 output range doesn't fit in
the host's view of the VM IPA range).

> But accessing VTCR is why the stage2_dissolve_p?d() stuff still
> needs the kvm pointer, hence the backreference... it might be neater
> to push the vtcr properties into kvm_s2_mmu that way you could drop
> the kvm backref, and only things that take vm-wide locks would need
> the kvm pointer. But I don't think it matters.

That's an interesting consideration. I'll have a look.

> I think I get it. I can't see anything that should be the other
> vm/vcpu pointer.
> 
> Reviewed-by: James Morse <james.morse@xxxxxxx>

Thanks!

> Some boring fiddly stuff:
> 
> [...]
> 
> > @@ -125,24 +123,24 @@ static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
> >  	}
> >  }
> >  
> > -static void __hyp_text __tlb_switch_to_host(struct kvm *kvm,
> > +static void __hyp_text __tlb_switch_to_host(struct kvm_s2_mmu *mmu,
> >  					    struct tlb_inv_context *cxt)
> >  {
> >  	if (has_vhe())
> > -		__tlb_switch_to_host_vhe(kvm, cxt);
> > +		__tlb_switch_to_host_vhe(cxt);
> >  	else
> > -		__tlb_switch_to_host_nvhe(kvm, cxt);
> > +		__tlb_switch_to_host_nvhe(cxt);
> >  }
> 
> What does __tlb_switch_to_host() need the kvm_s2_mmu for?

Not much. Obviously mechanical conversion of kvm->kvm_s2_mmu, and not
finishing the job. I'll fix that.

> 
> [...]
> 
> 
> >  void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
> >  {
> > -	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
> > +	struct kvm_s2_mmu *mmu = kern_hyp_va(kern_hyp_va(vcpu)->arch.hw_mmu);
> >  	struct tlb_inv_context cxt;
> >
> >  	/* Switch to requested VMID */
> > -	__tlb_switch_to_guest(kvm, &cxt);
> > +	__tlb_switch_to_guest(mmu, &cxt);
> >
> >  	__tlbi(vmalle1);
> >  	dsb(nsh);
> >  	isb();
> >
> > -	__tlb_switch_to_host(kvm, &cxt);
> > +	__tlb_switch_to_host(mmu, &cxt);
> >  }
> 
> Does this need the vcpu in the future?
> Its the odd one out, the other tlb functions here take the s2_mmu, or nothing.
> We only use the s2_mmu here.

I think this was done as a way not impact the 32bit code (rest in
peace). Definitely a candidate for immediate cleanup.

> 
> [...]
> 
> 
> > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > index e3b9ee268823b..2f99749048285 100644
> > --- a/virt/kvm/arm/mmu.c
> > +++ b/virt/kvm/arm/mmu.c
> 
> > @@ -96,31 +96,33 @@ static bool kvm_is_device_pfn(unsigned long pfn)
> >   *
> >   * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs.
> >   */
> > -static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
> > +static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)
> 
> The comment above this function still has '@kvm:	pointer to kvm structure.'
> 
> [...]
> 
> 
> > @@ -331,8 +339,9 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
> >   * destroying the VM), otherwise another faulting VCPU may come in and mess
> >   * with things behind our backs.
> >   */
> > -static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> > +static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> 
> The comment above this function still has '@kvm:   The VM pointer'
> 
> [...]
> 
> > -static void stage2_flush_memslot(struct kvm *kvm,
> > +static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
> >  				 struct kvm_memory_slot *memslot)
> >  {
> 
> Wouldn't something manipulating a memslot have to mess with a set of
> kvm_s2_mmu once this is all assembled?  stage2_unmap_memslot() takes
> struct kvm, it seems odd to pass one kvm_s2_mmu here.

Indeed, that doesn't make sense. I guess this was done to match
kvm_stage2_flush_range, which does need to take a kvm_s2_mmu (it is
directly called from the nesting code).

stage2_flush_memslot() only affects the "main" S2 PTs, so passing kvm
here should be the right thing to do.

> 
> [...]
> 
> > @@ -886,21 +898,23 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> 
> > -int kvm_alloc_stage2_pgd(struct kvm *kvm)
> > +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
> >  {
> >  	phys_addr_t pgd_phys;
> >  	pgd_t *pgd;
> > +	int cpu;
> >  
> > -	if (kvm->arch.pgd != NULL) {
> > +	if (mmu->pgd != NULL) {
> >  		kvm_err("kvm_arch already initialized?\n");
> 
> Does this error message still make sense?

Probably not anymore. I'll revisit that shortly.

> 
> 
> >  		return -EINVAL;
> >  	}
> 
> [...]
> 
> > @@ -1439,9 +1467,10 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
> >   * @addr:	range start address
> >   * @end:	range end address
> >   */
> > -static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
> > +static void stage2_wp_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
> >  			   phys_addr_t addr, phys_addr_t end)
> 
> The comment above this function still has 'kvm:		kvm instance for the VM'.
> 
> 
> >  {
> > +	struct kvm *kvm = mmu->kvm;
> >  	pmd_t *pmd;
> >  	phys_addr_t next;
> >  
> > @@ -1461,14 +1490,15 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
> >  }
> >  
> >  /**
> > - * stage2_wp_puds - write protect PGD range
> > - * @pgd:	pointer to pgd entry
> > - * @addr:	range start address
> > - * @end:	range end address
> > - */
> > -static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
> > +  * stage2_wp_puds - write protect PGD range
> > +  * @pgd:	pointer to pgd entry
> > +  * @addr:	range start address
> > +  * @end:	range end address
> > +  */
> > +static void  stage2_wp_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
> >  			    phys_addr_t addr, phys_addr_t end)
> 
> Comment was missing @kvm, now its missing @mmu....
> 
> 
> >  {
> > +	struct kvm *kvm __maybe_unused = mmu->kvm;
> >  	pud_t *pud;
> >  	phys_addr_t next;
> >  
> 
> > @@ -1492,12 +1522,13 @@ static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
> >   * @addr:	Start address of range
> >   * @end:	End address of range
> >   */
> > -static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> > +static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
> 
> The comment above this function still ... you get the picture.

Thanks a lot for the careful review. I'll respin this shortly, as this
is one of the patch I'd like to get in early.

	M.

-- 
Jazz is not dead, it just smells funny.