Hi, On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote: > In order to plug the brokenness of our current AT implementation, > we need a SW walker that is going to... err.. walk the S1 tables > and tell us what it finds. > > Of course, it builds on top of our S2 walker, and share similar > concepts. The beauty of it is that since it uses kvm_read_guest(), > it is able to bring back pages that have been otherwise evicted. > > This is then plugged in the two AT S1 emulation functions as > a "slow path" fallback. I'm not sure it is that slow, but hey. > [..] > switch (op) { > case OP_AT_S1E1RP: > case OP_AT_S1E1WP: > + retry_slow = false; > fail = check_at_pan(vcpu, vaddr, &par); > break; > default: > goto nopan; > } For context, this is what check_at_pan() does: static int check_at_pan(struct kvm_vcpu *vcpu, u64 vaddr, u64 *res) { u64 par_e0; int error; /* * For PAN-involved AT operations, perform the same translation, * using EL0 this time. Twice. Much fun. */ error = __kvm_at(OP_AT_S1E0R, vaddr); if (error) return error; par_e0 = read_sysreg_par(); if (!(par_e0 & SYS_PAR_EL1_F)) goto out; error = __kvm_at(OP_AT_S1E0W, vaddr); if (error) return error; par_e0 = read_sysreg_par(); out: *res = par_e0; return 0; } I'm having a hard time understanding why KVM is doing both AT S1E0R and AT S1E0W regardless of the type of the access (read/write) in the PAN-aware AT instruction. Would you mind elaborating on that? > + if (fail) { > + vcpu_write_sys_reg(vcpu, SYS_PAR_EL1_F, PAR_EL1); > + goto nopan; > + } > [..] > + if (par & SYS_PAR_EL1_F) { > + u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par); > + > + /* > + * If we get something other than a permission fault, we > + * need to retry, as we're likely to have missed in the PTs. > + */ > + if ((fst & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_PERM) > + retry_slow = true; > + } else { > + /* > + * The EL0 access succeded, but we don't have the full > + * syndrom information to synthetize the failure. Go slow. > + */ > + retry_slow = true; > + } This is what PSTATE.PAN controls: If the Effective value of PSTATE.PAN is 1, then a privileged data access from any of the following Exception levels to a virtual memory address that is accessible to data accesses at EL0 generates a stage 1 Permission fault: - A privileged data access from EL1. - If HCR_EL2.E2H is 1, then a privileged data access from EL2. With that in mind, I am really struggling to understand the logic. If AT S1E0{R,W} (from check_at_pan()) failed, doesn't that mean that the virtual memory address is not accessible to EL0? Add that to the fact that the AT S1E1{R,W} (from the beginning of __kvm_at_s1e01()) succeeded, doesn't that mean that AT S1E1{R,W}P should succeed, and furthermore the PAR_EL1 value should be the one KVM got from AT S1E1{R,W}? Thanks, Alex > nopan: > __mmu_config_restore(&config); > out: > local_irq_restore(flags); > > write_unlock(&vcpu->kvm->mmu_lock); > + > + /* > + * If retry_slow is true, then we either are missing shadow S2 > + * entries, have paged out guest S1, or something is inconsistent. > + * > + * Either way, we need to walk the PTs by hand so that we can either > + * fault things back, in or record accurate fault information along > + * the way. > + */ > + if (retry_slow) { > + par = handle_at_slow(vcpu, op, vaddr); > + vcpu_write_sys_reg(vcpu, par, PAR_EL1); > + } > } > > void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) > @@ -433,6 +931,10 @@ void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) > > write_unlock(&vcpu->kvm->mmu_lock); > > + /* We failed the translation, let's replay it in slow motion */ > + if (!fail && (par & SYS_PAR_EL1_F)) > + par = handle_at_slow(vcpu, op, vaddr); > + > vcpu_write_sys_reg(vcpu, par, PAR_EL1); > } > > -- > 2.39.2 > >