Hi Marc, Drive by thing I spotted. On Mon, Nov 20, 2023 at 01:10:03PM +0000, Marc Zyngier wrote: > If we are faulting on a shadow stage 2 translation, we first walk the > guest hypervisor's stage 2 page table to see if it has a mapping. If > not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we > create a mapping in the shadow stage 2 page table. > > Note that we have to deal with two IPAs when we got a shadow stage 2 > page fault. One is the address we faulted on, and is in the L2 guest > phys space. The other is from the guest stage-2 page table walk, and is > in the L1 guest phys space. To differentiate them, we rename variables > so that fault_ipa is used for the former and ipa is used for the latter. > > Co-developed-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx> > Co-developed-by: Jintack Lim <jintack.lim@xxxxxxxxxx> > Signed-off-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx> > Signed-off-by: Jintack Lim <jintack.lim@xxxxxxxxxx> > [maz: rewrote this multiple times...] > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > --- > arch/arm64/include/asm/kvm_emulate.h | 7 +++ > arch/arm64/include/asm/kvm_nested.h | 19 ++++++ > arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++---- > arch/arm64/kvm/nested.c | 48 +++++++++++++++ > 4 files changed, 153 insertions(+), 10 deletions(-) > [.. snip ..] > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 588ce46c0ad0..41de7616b735 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1412,14 +1412,16 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) > } > > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > - struct kvm_memory_slot *memslot, unsigned long hva, > - unsigned long fault_status) > + struct kvm_s2_trans *nested, > + struct kvm_memory_slot *memslot, > + unsigned long hva, unsigned long fault_status) > { > int ret = 0; > bool write_fault, writable, force_pte = false; > bool exec_fault, mte_allowed; > bool device = false; > unsigned long mmu_seq; > + phys_addr_t ipa = fault_ipa; > struct kvm *kvm = vcpu->kvm; > struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; > struct vm_area_struct *vma; > @@ -1504,10 +1506,38 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > } > > vma_pagesize = 1UL << vma_shift; > + > + if (nested) { > + unsigned long max_map_size; > + > + max_map_size = force_pte ? PUD_SIZE : PAGE_SIZE; This seems like the wrong way around, presumably you want PAGE_SIZE for force_pte? > + > + ipa = kvm_s2_trans_output(nested); > + > + /* > + * If we're about to create a shadow stage 2 entry, then we > + * can only create a block mapping if the guest stage 2 page > + * table uses at least as big a mapping. > + */ > + max_map_size = min(kvm_s2_trans_size(nested), max_map_size); > + > + /* > + * Be careful that if the mapping size falls between > + * two host sizes, take the smallest of the two. > + */ > + if (max_map_size >= PMD_SIZE && max_map_size < PUD_SIZE) > + max_map_size = PMD_SIZE; > + else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE) > + max_map_size = PAGE_SIZE; > + > + force_pte = (max_map_size == PAGE_SIZE); > + vma_pagesize = min(vma_pagesize, (long)max_map_size); > + } > + > if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) > fault_ipa &= ~(vma_pagesize - 1); > > - gfn = fault_ipa >> PAGE_SHIFT; > + gfn = ipa >> PAGE_SHIFT; > mte_allowed = kvm_vma_mte_allowed(vma); > > /* Don't use the VMA after the unlock -- it may have vanished */ [.. snip ..] Thanks, Joey