Hi Marc, Finally managed to go through the patch. Bunch of nitpicks from me (can be safely ignored), and some corner cases where KVM deviates from the spec. On Wed, Jul 31, 2024 at 08:40:26PM +0100, Marc Zyngier wrote: > In order to plug the brokenness of our current AT implementation, > we need a SW walker that is going to... err.. walk the S1 tables > and tell us what it finds. > > Of course, it builds on top of our S2 walker, and share similar > concepts. The beauty of it is that since it uses kvm_read_guest(), > it is able to bring back pages that have been otherwise evicted. > > This is then plugged in the two AT S1 emulation functions as > a "slow path" fallback. I'm not sure it is that slow, but hey. > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > --- > arch/arm64/kvm/at.c | 567 +++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 565 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c > index 9865d29b3149..8e1f0837e309 100644 > --- a/arch/arm64/kvm/at.c > +++ b/arch/arm64/kvm/at.c > @@ -4,9 +4,372 @@ > * Author: Jintack Lim <jintack.lim@xxxxxxxxxx> > */ > > +#include <linux/kvm_host.h> > + > +#include <asm/esr.h> > #include <asm/kvm_hyp.h> > #include <asm/kvm_mmu.h> > > +enum trans_regime { > + TR_EL10, > + TR_EL20, > + TR_EL2, > +}; > + > +struct s1_walk_info { > + u64 baddr; > + enum trans_regime regime; > + unsigned int max_oa_bits; > + unsigned int pgshift; > + unsigned int txsz; > + int sl; > + bool hpd; > + bool be; > + bool s2; > +}; > + > +struct s1_walk_result { > + union { > + struct { > + u64 desc; > + u64 pa; > + s8 level; > + u8 APTable; > + bool UXNTable; > + bool PXNTable; > + }; > + struct { > + u8 fst; > + bool ptw; > + bool s2; > + }; > + }; > + bool failed; > +}; > + > +static void fail_s1_walk(struct s1_walk_result *wr, u8 fst, bool ptw, bool s2) > +{ > + wr->fst = fst; > + wr->ptw = ptw; > + wr->s2 = s2; > + wr->failed = true; > +} > + > +#define S1_MMU_DISABLED (-127) > + > +static int get_ia_size(struct s1_walk_info *wi) > +{ > + return 64 - wi->txsz; > +} > + > +/* return true of the IPA is out of the OA range */ *R*eturn true *if* the IPA is out of the OA range? > +static bool check_output_size(u64 ipa, struct s1_walk_info *wi) > +{ > + return wi->max_oa_bits < 48 && (ipa & GENMASK_ULL(47, wi->max_oa_bits)); > +} Matches AArch64.OAOutOfRange(), where KVM supports a maximum oasize of 48 bits, and AArch64.PAMax() is get_kvm_ipa_limit(). > + > +/* Return the translation regime that applies to an AT instruction */ > +static enum trans_regime compute_translation_regime(struct kvm_vcpu *vcpu, u32 op) > +{ > + /* > + * We only get here from guest EL2, so the translation > + * regime AT applies to is solely defined by {E2H,TGE}. > + */ > + switch (op) { > + case OP_AT_S1E2R: > + case OP_AT_S1E2W: > + return vcpu_el2_e2h_is_set(vcpu) ? TR_EL20 : TR_EL2; This matches the pseudocode for the instructions, which calls AArch64.AT(el_in=EL2). AT(el_in=EL2) calls TranslationRegime(el=EL2), which returns Regime_EL20 if E2H is set (in ELIsInHost(el=EL2)), otherwise Regime_EL2. > + break; > + default: > + return (vcpu_el2_e2h_is_set(vcpu) && > + vcpu_el2_tge_is_set(vcpu)) ? TR_EL20 : TR_EL10; This also looks correct to me. Following the pseudocode was not trivial, so I'm leaving this here in case I made a mistake somewhere. For the S1E0* variants: AArch64.AT(el_in=EL0) => TranslationRegime(el=EL0) => ELIsInHost(el=EL0); ELIsInHost(el=EL0) is true if {E2H,TGE} = {1,1}, and in this case TranslationRegime(el=EL0) returns Regime_EL20, otherwise Regime_EL10. For the S1E1* variants: AArch64.AT(el_in=EL1), where: - if ELIsInHost(el=EL0) is true, then 'el' is changed to EL2, where ELIsInHost(el=EL0) is true if {E2H,TGE} = {1,1}. In this case, TranslationRegime(el=EL2) will return Regime_EL20. - if ELIsInHost(el=EL0) is false, then 'el' remains EL1, and TranslationRegime(el=EL1) returns Regime_EL10. > + } > +} > + > +static int setup_s1_walk(struct kvm_vcpu *vcpu, u32 op, struct s1_walk_info *wi, > + struct s1_walk_result *wr, u64 va) > +{ > + u64 sctlr, tcr, tg, ps, ia_bits, ttbr; > + unsigned int stride, x; > + bool va55, tbi, lva, as_el0; > + > + wi->regime = compute_translation_regime(vcpu, op); > + as_el0 = (op == OP_AT_S1E0R || op == OP_AT_S1E0W); > + > + va55 = va & BIT(55); > + > + if (wi->regime == TR_EL2 && va55) > + goto addrsz; > + > + wi->s2 = wi->regime == TR_EL10 && (__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_VM); According to AArch64.NSS2TTWParams(), stage 2 is enabled if HCR_EL2.VM or HCR_EL2.DC.