Hi, On Thu, Apr 28, 2022 at 06:55:56PM +0100, Marc Zyngier wrote: > On Thu, 28 Apr 2022 17:07:21 +0100, > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > > > Hi, > > > > On Thu, Apr 28, 2022 at 04:22:58PM +0100, Marc Zyngier wrote: > > > On Thu, 28 Apr 2022 09:46:21 +0100, > > > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > > > > > > > Hi, > > > > > > > > On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote: > > > > > When taking a translation fault for an IPA that is outside of > > > > > the range defined by the hypervisor (between the HW PARange and > > > > > the IPA range), we stupidly treat it as an IO and forward the access > > > > > to userspace. Of course, userspace can't do much with it, and things > > > > > end badly. > > > > > > > > > > Arguably, the guest is braindead, but we should at least catch the > > > > > case and inject an exception. > > > > > > > > > > Check the faulting IPA against: > > > > > - the sanitised PARange: inject an address size fault > > > > > - the IPA size: inject an abort > > > > > > > > > > Reported-by: Christoffer Dall <christoffer.dall@xxxxxxx> > > > > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > > > > > --- > > > > > arch/arm64/include/asm/kvm_emulate.h | 1 + > > > > > arch/arm64/kvm/inject_fault.c | 28 ++++++++++++++++++++++++++++ > > > > > arch/arm64/kvm/mmu.c | 19 +++++++++++++++++++ > > > > > 3 files changed, 48 insertions(+) > > > > > > > > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h > > > > > index 7496deab025a..f71358271b71 100644 > > > > > --- a/arch/arm64/include/asm/kvm_emulate.h > > > > > +++ b/arch/arm64/include/asm/kvm_emulate.h > > > > > @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu); > > > > > void kvm_inject_vabt(struct kvm_vcpu *vcpu); > > > > > void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr); > > > > > void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr); > > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu); > > > > > > > > > > void kvm_vcpu_wfi(struct kvm_vcpu *vcpu); > > > > > > > > > > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c > > > > > index b47df73e98d7..ba20405d2dc2 100644 > > > > > --- a/arch/arm64/kvm/inject_fault.c > > > > > +++ b/arch/arm64/kvm/inject_fault.c > > > > > @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr) > > > > > inject_abt64(vcpu, true, addr); > > > > > } > > > > > > > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu) > > > > > +{ > > > > > + unsigned long addr, esr; > > > > > + > > > > > + addr = kvm_vcpu_get_fault_ipa(vcpu); > > > > > + addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0); > > > > > + > > > > > + if (kvm_vcpu_trap_is_iabt(vcpu)) > > > > > + kvm_inject_pabt(vcpu, addr); > > > > > + else > > > > > + kvm_inject_dabt(vcpu, addr); > > > > > + > > > > > + /* > > > > > + * If AArch64 or LPAE, set FSC to 0 to indicate an Address > > > > > + * Size Fault at level 0, as if exceeding PARange. > > > > > + * > > > > > + * Non-LPAE guests will only get the external abort, as there > > > > > + * is no way to to describe the ASF. > > > > > + */ > > > > > + if (vcpu_el1_is_32bit(vcpu) && > > > > > + !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE)) > > > > > + return; > > > > > + > > > > > + esr = vcpu_read_sys_reg(vcpu, ESR_EL1); > > > > > + esr &= ~GENMASK_ULL(5, 0); > > > > > + vcpu_write_sys_reg(vcpu, esr, ESR_EL1); > > > > > +} > > > > > + > > > > > /** > > > > > * kvm_inject_undefined - inject an undefined instruction into the guest > > > > > * @vcpu: The vCPU in which to inject the exception > > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > > > index 53ae2c0640bc..5400fc020164 100644 > > > > > --- a/arch/arm64/kvm/mmu.c > > > > > +++ b/arch/arm64/kvm/mmu.c > > > > > @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > > > > > fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); > > > > > is_iabt = kvm_vcpu_trap_is_iabt(vcpu); > > > > > > > > > > + if (fault_status == FSC_FAULT) { > > > > > + /* Beyond sanitised PARange (which is the IPA limit) */ > > > > > + if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) { > > > > > + kvm_inject_size_fault(vcpu); > > > > > + return 1; > > > > > + } > > > > > + > > > > > + /* Falls between the IPA range and the PARange? */ > > > > > + if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) { > > > > > + fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0); > > > > > + > > > > > + if (is_iabt) > > > > > + kvm_inject_pabt(vcpu, fault_ipa); > > > > > + else > > > > > + kvm_inject_dabt(vcpu, fault_ipa); > > > > > + return 1; > > > > > + } > > > > > > > > Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO > > > > aborts? From the guest's point of view, the IPA is valid because it's > > > > inside the HW PARange, so it's not entirely impossible that the VMM put a > > > > device there. > > > > > > Sure. But the generated IPA is outside of the range the VMM has asked > > > to handle. The IPA space describes the whole of the guest address > > > space, and there shouldn't be anything outside of it. > > > > > > We actually state in the documentation that the IPA Size limit *is* > > > the physical address size for the VM. If the VMM places something > > > outside if the IPA space and still expect something to be reported to > > > it, we have a problem (in some cases, we may want to actually put page > > > tables in place even for MMIO that traps to userspace -- see my > > > earlier work on MMIO guard). > > > > If you mean this bit: > > > > On arm64, the physical address size for a VM (IPA Size limit) is limited > > to 40bits by default. The limit can be configured if the host supports the > > extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use > > KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type > > identifier, where IPA_Bits is the maximum width of any physical > > address used by the VM. > > > > And then below there is this paragraph: > > > > Please note that configuring the IPA size does not affect the capability > > exposed by the guest CPUs in ID_AA64MMFR0_EL1[PARange]. It only affects > > **size of the address translated by the stage2 level (guest physical to > > host physical address translations)**. > > I don't see that as a contradiction. It just says that we don't > repaint PARange. > > > > > Emphasis added by me. > > > > It looks to me like the two paragraph state different things, first says > > the IPA size caps "the physical address size for a VM", the second that it > > caps the RAM size - "size of the address translated by the stage 2 level. > > That's not the way I understand it. It just gives a textbook > definition of the IPA space. And to be clear, this is just an > implementation detail. We should be able to populate all full IPA > space with faulting entries and keep the behaviour the same. > > > I have no problem with either, but it looks confusing. > > > > So if a VMM that wants to put devices above RAM it must make sure that the > > IPA size is extended to match, did I get that right? > > Yes. Anything that is reacheable by a memory transaction has to fit in > the IPA space. > > > I'm also a bit confused about the rationale. Why is the PARange exposed to > > the guest in effect the wrong value, because the true PARange is defined by > > IPA size? > > PARange and IPA range don't have the same granularity. You can't > express a PARange of 37 bits, for example, while it is perfectly > possible for the IPA range. And they do cover two different concepts: > the IPA space means nothing to the guest. I see, thank you for the detailed explanation! Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm