Hi, On Thu, Apr 28, 2022 at 04:22:58PM +0100, Marc Zyngier wrote: > On Thu, 28 Apr 2022 09:46:21 +0100, > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > > > Hi, > > > > On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote: > > > When taking a translation fault for an IPA that is outside of > > > the range defined by the hypervisor (between the HW PARange and > > > the IPA range), we stupidly treat it as an IO and forward the access > > > to userspace. Of course, userspace can't do much with it, and things > > > end badly. > > > > > > Arguably, the guest is braindead, but we should at least catch the > > > case and inject an exception. > > > > > > Check the faulting IPA against: > > > - the sanitised PARange: inject an address size fault > > > - the IPA size: inject an abort > > > > > > Reported-by: Christoffer Dall <christoffer.dall@xxxxxxx> > > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > > > --- > > > arch/arm64/include/asm/kvm_emulate.h | 1 + > > > arch/arm64/kvm/inject_fault.c | 28 ++++++++++++++++++++++++++++ > > > arch/arm64/kvm/mmu.c | 19 +++++++++++++++++++ > > > 3 files changed, 48 insertions(+) > > > > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h > > > index 7496deab025a..f71358271b71 100644 > > > --- a/arch/arm64/include/asm/kvm_emulate.h > > > +++ b/arch/arm64/include/asm/kvm_emulate.h > > > @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu); > > > void kvm_inject_vabt(struct kvm_vcpu *vcpu); > > > void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr); > > > void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr); > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu); > > > > > > void kvm_vcpu_wfi(struct kvm_vcpu *vcpu); > > > > > > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c > > > index b47df73e98d7..ba20405d2dc2 100644 > > > --- a/arch/arm64/kvm/inject_fault.c > > > +++ b/arch/arm64/kvm/inject_fault.c > > > @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr) > > > inject_abt64(vcpu, true, addr); > > > } > > > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu) > > > +{ > > > + unsigned long addr, esr; > > > + > > > + addr = kvm_vcpu_get_fault_ipa(vcpu); > > > + addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0); > > > + > > > + if (kvm_vcpu_trap_is_iabt(vcpu)) > > > + kvm_inject_pabt(vcpu, addr); > > > + else > > > + kvm_inject_dabt(vcpu, addr); > > > + > > > + /* > > > + * If AArch64 or LPAE, set FSC to 0 to indicate an Address > > > + * Size Fault at level 0, as if exceeding PARange. > > > + * > > > + * Non-LPAE guests will only get the external abort, as there > > > + * is no way to to describe the ASF. > > > + */ > > > + if (vcpu_el1_is_32bit(vcpu) && > > > + !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE)) > > > + return; > > > + > > > + esr = vcpu_read_sys_reg(vcpu, ESR_EL1); > > > + esr &= ~GENMASK_ULL(5, 0); > > > + vcpu_write_sys_reg(vcpu, esr, ESR_EL1); > > > +} > > > + > > > /** > > > * kvm_inject_undefined - inject an undefined instruction into the guest > > > * @vcpu: The vCPU in which to inject the exception > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > index 53ae2c0640bc..5400fc020164 100644 > > > --- a/arch/arm64/kvm/mmu.c > > > +++ b/arch/arm64/kvm/mmu.c > > > @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > > > fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); > > > is_iabt = kvm_vcpu_trap_is_iabt(vcpu); > > > > > > + if (fault_status == FSC_FAULT) { > > > + /* Beyond sanitised PARange (which is the IPA limit) */ > > > + if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) { > > > + kvm_inject_size_fault(vcpu); > > > + return 1; > > > + } > > > + > > > + /* Falls between the IPA range and the PARange? */ > > > + if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) { > > > + fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0); > > > + > > > + if (is_iabt) > > > + kvm_inject_pabt(vcpu, fault_ipa); > > > + else > > > + kvm_inject_dabt(vcpu, fault_ipa); > > > + return 1; > > > + } > > > > Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO > > aborts? From the guest's point of view, the IPA is valid because it's > > inside the HW PARange, so it's not entirely impossible that the VMM put a > > device there. > > Sure. But the generated IPA is outside of the range the VMM has asked > to handle. The IPA space describes the whole of the guest address > space, and there shouldn't be anything outside of it. > > We actually state in the documentation that the IPA Size limit *is* > the physical address size for the VM. If the VMM places something > outside if the IPA space and still expect something to be reported to > it, we have a problem (in some cases, we may want to actually put page > tables in place even for MMIO that traps to userspace -- see my > earlier work on MMIO guard). If you mean this bit: On arm64, the physical address size for a VM (IPA Size limit) is limited to 40bits by default. The limit can be configured if the host supports the extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type identifier, where IPA_Bits is the maximum width of any physical address used by the VM. And then below there is this paragraph: Please note that configuring the IPA size does not affect the capability exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects **size of the address translated by the stage2 level (guest physical to host physical address translations)**. Emphasis added by me. It looks to me like the two paragraph state different things, first says the IPA size caps "the physical address size for a VM", the second that it caps the RAM size - "size of the address translated by the stage 2 level. I have no problem with either, but it looks confusing. So if a VMM that wants to put devices above RAM it must make sure that the IPA size is extended to match, did I get that right? I'm also a bit confused about the rationale. Why is the PARange exposed to the guest in effect the wrong value, because the true PARange is defined by IPA size? Thanks, Alex