On Thu, 28 Apr 2022 09:46:21 +0100, Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > Hi, > > On Wed, Apr 27, 2022 at 11:04:34PM +0100, Marc Zyngier wrote: > > When taking a translation fault for an IPA that is outside of > > the range defined by the hypervisor (between the HW PARange and > > the IPA range), we stupidly treat it as an IO and forward the access > > to userspace. Of course, userspace can't do much with it, and things > > end badly. > > > > Arguably, the guest is braindead, but we should at least catch the > > case and inject an exception. > > > > Check the faulting IPA against: > > - the sanitised PARange: inject an address size fault > > - the IPA size: inject an abort > > > > Reported-by: Christoffer Dall <christoffer.dall@xxxxxxx> > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > > --- > > arch/arm64/include/asm/kvm_emulate.h | 1 + > > arch/arm64/kvm/inject_fault.c | 28 ++++++++++++++++++++++++++++ > > arch/arm64/kvm/mmu.c | 19 +++++++++++++++++++ > > 3 files changed, 48 insertions(+) > > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h > > index 7496deab025a..f71358271b71 100644 > > --- a/arch/arm64/include/asm/kvm_emulate.h > > +++ b/arch/arm64/include/asm/kvm_emulate.h > > @@ -40,6 +40,7 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu); > > void kvm_inject_vabt(struct kvm_vcpu *vcpu); > > void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr); > > void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr); > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu); > > > > void kvm_vcpu_wfi(struct kvm_vcpu *vcpu); > > > > diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c > > index b47df73e98d7..ba20405d2dc2 100644 > > --- a/arch/arm64/kvm/inject_fault.c > > +++ b/arch/arm64/kvm/inject_fault.c > > @@ -145,6 +145,34 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr) > > inject_abt64(vcpu, true, addr); > > } > > > > +void kvm_inject_size_fault(struct kvm_vcpu *vcpu) > > +{ > > + unsigned long addr, esr; > > + > > + addr = kvm_vcpu_get_fault_ipa(vcpu); > > + addr |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0); > > + > > + if (kvm_vcpu_trap_is_iabt(vcpu)) > > + kvm_inject_pabt(vcpu, addr); > > + else > > + kvm_inject_dabt(vcpu, addr); > > + > > + /* > > + * If AArch64 or LPAE, set FSC to 0 to indicate an Address > > + * Size Fault at level 0, as if exceeding PARange. > > + * > > + * Non-LPAE guests will only get the external abort, as there > > + * is no way to to describe the ASF. > > + */ > > + if (vcpu_el1_is_32bit(vcpu) && > > + !(vcpu_read_sys_reg(vcpu, TCR_EL1) & TTBCR_EAE)) > > + return; > > + > > + esr = vcpu_read_sys_reg(vcpu, ESR_EL1); > > + esr &= ~GENMASK_ULL(5, 0); > > + vcpu_write_sys_reg(vcpu, esr, ESR_EL1); > > +} > > + > > /** > > * kvm_inject_undefined - inject an undefined instruction into the guest > > * @vcpu: The vCPU in which to inject the exception > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 53ae2c0640bc..5400fc020164 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1337,6 +1337,25 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > > fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); > > is_iabt = kvm_vcpu_trap_is_iabt(vcpu); > > > > + if (fault_status == FSC_FAULT) { > > + /* Beyond sanitised PARange (which is the IPA limit) */ > > + if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) { > > + kvm_inject_size_fault(vcpu); > > + return 1; > > + } > > + > > + /* Falls between the IPA range and the PARange? */ > > + if (fault_ipa >= BIT_ULL(vcpu->arch.hw_mmu->pgt->ia_bits)) { > > + fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0); > > + > > + if (is_iabt) > > + kvm_inject_pabt(vcpu, fault_ipa); > > + else > > + kvm_inject_dabt(vcpu, fault_ipa); > > + return 1; > > + } > > Doesn't KVM treat faults outside a valid memslot (aka guest RAM) as MMIO > aborts? From the guest's point of view, the IPA is valid because it's > inside the HW PARange, so it's not entirely impossible that the VMM put a > device there. Sure. But the generated IPA is outside of the range the VMM has asked to handle. The IPA space describes the whole of the guest address space, and there shouldn't be anything outside of it. We actually state in the documentation that the IPA Size limit *is* the physical address size for the VM. If the VMM places something outside if the IPA space and still expect something to be reported to it, we have a problem (in some cases, we may want to actually put page tables in place even for MMIO that traps to userspace -- see my earlier work on MMIO guard). Does it make sense to you? M. -- Without deviation from the norm, progress is not possible.