Hi Marc, On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote: > In order to plug the brokenness of our current AT implementation, > we need a SW walker that is going to... err.. walk the S1 tables > and tell us what it finds. > > Of course, it builds on top of our S2 walker, and share similar > concepts. The beauty of it is that since it uses kvm_read_guest(), > it is able to bring back pages that have been otherwise evicted. > > This is then plugged in the two AT S1 emulation functions as > a "slow path" fallback. I'm not sure it is that slow, but hey. > > Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx> > [..] > @@ -331,18 +801,17 @@ void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) > } > > if (!fail) > - par = read_sysreg(par_el1); > + par = read_sysreg_par(); > else > par = SYS_PAR_EL1_F; > > + retry_slow = !fail; > + > vcpu_write_sys_reg(vcpu, par, PAR_EL1); > > /* > - * Failed? let's leave the building now. > - * > - * FIXME: how about a failed translation because the shadow S2 > - * wasn't populated? We may need to perform a SW PTW, > - * populating our shadow S2 and retry the instruction. > + * Failed? let's leave the building now, unless we retry on > + * the slow path. > */ > if (par & SYS_PAR_EL1_F) > goto nopan; This is what follows after the 'if' statement above, and before the 'switch' below: /* No PAN? No problem. */ if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT)) goto nopan; When KVM is executing this statement, the following is true: 1. SYS_PAR_EL1_F is clear => the hardware translation table walk was successful. 2. retry_slow = true; Then if the PAN bit is not set, the function jumps to the nopan label, and performs a software translation table walk, even though the hardware walk performed by AT was successful. > @@ -354,29 +823,58 @@ void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) > switch (op) { > case OP_AT_S1E1RP: > case OP_AT_S1E1WP: > + retry_slow = false; > fail = check_at_pan(vcpu, vaddr, &par); > break; > default: > goto nopan; > } > > + if (fail) { > + vcpu_write_sys_reg(vcpu, SYS_PAR_EL1_F, PAR_EL1); > + goto nopan; > + } > + > /* > * If the EL0 translation has succeeded, we need to pretend > * the AT operation has failed, as the PAN setting forbids > * such a translation. > - * > - * FIXME: we hardcode a Level-3 permission fault. We really > - * should return the real fault level. > */ > - if (fail || !(par & SYS_PAR_EL1_F)) > - vcpu_write_sys_reg(vcpu, (0xf << 1) | SYS_PAR_EL1_F, PAR_EL1); > - > + if (par & SYS_PAR_EL1_F) { > + u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par); > + > + /* > + * If we get something other than a permission fault, we > + * need to retry, as we're likely to have missed in the PTs. > + */ > + if ((fst & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_PERM) > + retry_slow = true; Shouldn't VCPU's PAR_EL1 register be updated here? As far as I can tell, at this point the VCPU PAR_EL1 register has the result from the successful walk performed by AT S1E1R or AT S1E1W in the first 'switch' statement. Thanks, Alex > + } else { > + /* > + * The EL0 access succeded, but we don't have the full > + * syndrom information to synthetize the failure. Go slow. > + */ > + retry_slow = true; > + } > nopan: > __mmu_config_restore(&config); > out: > local_irq_restore(flags); > > write_unlock(&vcpu->kvm->mmu_lock); > + > + /* > + * If retry_slow is true, then we either are missing shadow S2 > + * entries, have paged out guest S1, or something is inconsistent. > + * > + * Either way, we need to walk the PTs by hand so that we can either > + * fault things back, in or record accurate fault information along > + * the way. > + */ > + if (retry_slow) { > + par = handle_at_slow(vcpu, op, vaddr); > + vcpu_write_sys_reg(vcpu, par, PAR_EL1); > + } > }