On Thu, 28 Apr 2022 17:07:21 +0100, Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > Hi, > > On Thu, Apr 28, 2022 at 04:22:58PM +0100, Marc Zyngier wrote: > > On Thu, 28 Apr 2022 09:46:21 +0100, > > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > > > > > Hi, > > > > > > On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote: > > > > When taking a translation fault for an IPA that is outside of > > > > the range defined by the hypervisor (between the HW PARange and > > > > the IPA range), we stupidly treat it as an IO and forward the access > > > > to userspace. Of course, userspace can't do much with it, and things > > > > end badly. > > > > > > > > Arguably, the guest is braindead, but we should at least catch the > > > > case and inject an exception. > > > > > > > > Check the faulting IPA against: > > > > - the sanitised PARange: inject an address size fault > > > > - the IPA size: inject an abort > > > > > > > > Reported-by: Christoffer Dall <christoffer.dall@xxxxxxx> > > > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > > > > --- > > > > arch/arm64/include/asm/kvm_emulate.h | 1 + > > > > arch/arm64/kvm/inject_fault.c | 28 ++++++++++++++++++++++++++++ > > > > arch/arm64/kvm/mmu.c | 19 +++++++++++++++++++ > > > > 3 files changed, 48 insertions(+) > > > > > > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h > > > > index 7496deab025a..f71358271b71 100644 > > > > --- a/arch/arm64/include/asm/kvm_emulate.h > > > > +++ b/arch/arm64/include/asm/kvm_emulate.h > > > > @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu); > > > > void kvm_inject_vabt(struct kvm_vcpu *vcpu); > > > > void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr); > > > > void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr); > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu); > > > > > > > > void kvm_vcpu_wfi(struct kvm_vcpu *vcpu); > > > > > > > > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c > > > > index b47df73e98d7..ba20405d2dc2 100644 > > > > --- a/arch/arm64/kvm/inject_fault.c > > > > +++ b/arch/arm64/kvm/inject_fault.c > > > > @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr) > > > > inject_abt64(vcpu, true, addr); > > > > } > > > > > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu) > > > > +{ > > > > + unsigned long addr, esr; > > > > + > > > > + addr = kvm_vcpu_get_fault_ipa(vcpu); > > > > + addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0); > > > > + > > > > + if (kvm_vcpu_trap_is_iabt(vcpu)) > > > > + kvm_inject_pabt(vcpu, addr); > > > > + else > > > > + kvm_inject_dabt(vcpu, addr); > > > > + > > > > + /* > > > > + * If AArch64 or LPAE, set FSC to 0 to indicate an Address > > > > + * Size Fault at level 0, as if exceeding PARange. > > > > + * > > > > + * Non-LPAE guests will only get the external abort, as there > > > > + * is no way to to describe the ASF. > > > > + */ > > > > + if (vcpu_el1_is_32bit(vcpu) && > > > > + !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE)) > > > > + return; > > > > + > > > > + esr = vcpu_read_sys_reg(vcpu, ESR_EL1); > > > > + esr &= ~GENMASK_ULL(5, 0); > > > > + vcpu_write_sys_reg(vcpu, esr, ESR_EL1); > > > > +} > > > > + > > > > /** > > > > * kvm_inject_undefined - inject an undefined instruction into the guest > > > > * @vcpu: The vCPU in which to inject the exception > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > > index 53ae2c0640bc..5400fc020164 100644 > > > > --- a/arch/arm64/kvm/mmu.c > > > > +++ b/arch/arm64/kvm/mmu.c > > > > @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > > > > fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); > > > > is_iabt = kvm_vcpu_trap_is_iabt(vcpu); > > > > > > > > + if (fault_status == FSC_FAULT) { > > > > + /* Beyond sanitised PARange (which is the IPA limit) */ > > > > + if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) { > > > > + kvm_inject_size_fault(vcpu); > > > > + return 1; > > > > + } > > > > + > > > > + /* Falls between the IPA range and the PARange? */ > > > > + if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) { > > > > + fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0); > > > > + > > > > + if (is_iabt) > > > > + kvm_inject_pabt(vcpu, fault_ipa); > > > > + else > > > > + kvm_inject_dabt(vcpu, fault_ipa); > > > > + return 1; > > > > + } > > > > > > Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO > > > aborts? From the guest's point of view, the IPA is valid because it's > > > inside the HW PARange, so it's not entirely impossible that the VMM put a > > > device there. > > > > Sure. But the generated IPA is outside of the range the VMM has asked > > to handle. The IPA space describes the whole of the guest address > > space, and there shouldn't be anything outside of it. > > > > We actually state in the documentation that the IPA Size limit *is* > > the physical address size for the VM. If the VMM places something > > outside if the IPA space and still expect something to be reported to > > it, we have a problem (in some cases, we may want to actually put page > > tables in place even for MMIO that traps to userspace -- see my > > earlier work on MMIO guard). > > If you mean this bit: > > On arm64, the physical address size for a VM (IPA Size limit) is limited > to 40bits by default. The limit can be configured if the host supports the > extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use > KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type > identifier, where IPA_Bits is the maximum width of any physical > address used by the VM. > > And then below there is this paragraph: > > Please note that configuring the IPA size does not affect the capability > exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects > **size of the address translated by the stage2 level (guest physical to > host physical address translations)**. I don't see that as a contradiction. It just says that we don't repaint PARange. > > Emphasis added by me. > > It looks to me like the two paragraph state different things, first says > the IPA size caps "the physical address size for a VM", the second that it > caps the RAM size - "size of the address translated by the stage 2 level. That's not the way I understand it. It just gives a textbook definition of the IPA space. And to be clear, this is just an implementation detail. We should be able to populate all full IPA space with faulting entries and keep the behaviour the same. > I have no problem with either, but it looks confusing. > > So if a VMM that wants to put devices above RAM it must make sure that the > IPA size is extended to match, did I get that right? Yes. Anything that is reacheable by a memory transaction has to fit in the IPA space. > I'm also a bit confused about the rationale. Why is the PARange exposed to > the guest in effect the wrong value, because the true PARange is defined by > IPA size? PARange and IPA range don't have the same granularity. You can't express a PARange of 37 bits, for example, while it is perfectly possible for the IPA range. And they do cover two different concepts: the IPA space means nothing to the guest. M. -- Without deviation from the norm, progress is not possible.