On Thu, 11 Jul 2024 11:56:13 +0100, Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > Hi, > > On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote: > > In order to plug the brokenness of our current AT implementation, > > we need a SW walker that is going to... err.. walk the S1 tables > > and tell us what it finds. > > > > Of course, it builds on top of our S2 walker, and share similar > > concepts. The beauty of it is that since it uses kvm_read_guest(), > > it is able to bring back pages that have been otherwise evicted. > > > > This is then plugged in the two AT S1 emulation functions as > > a "slow path" fallback. I'm not sure it is that slow, but hey. > > [..] > > switch (op) { > > case OP_AT_S1E1RP: > > case OP_AT_S1E1WP: > > + retry_slow = false; > > fail = check_at_pan(vcpu, vaddr, &par); > > break; > > default: > > goto nopan; > > } > > For context, this is what check_at_pan() does: > > static int check_at_pan(struct kvm_vcpu *vcpu, u64 vaddr, u64 *res) > { > u64 par_e0; > int error; > > /* > * For PAN-involved AT operations, perform the same translation, > * using EL0 this time. Twice. Much fun. > */ > error = __kvm_at(OP_AT_S1E0R, vaddr); > if (error) > return error; > > par_e0 = read_sysreg_par(); > if (!(par_e0 & SYS_PAR_EL1_F)) > goto out; > > error = __kvm_at(OP_AT_S1E0W, vaddr); > if (error) > return error; > > par_e0 = read_sysreg_par(); > out: > *res = par_e0; > return 0; > } > > I'm having a hard time understanding why KVM is doing both AT S1E0R and AT S1E0W > regardless of the type of the access (read/write) in the PAN-aware AT > instruction. Would you mind elaborating on that? Because that's the very definition of an AT S1E1{W,R}P instruction when PAN is set. If *any* EL0 permission is set, then the translation must equally fail. Just like a load or a store from EL1 would fail if any EL0 permission is set when PSTATE.PAN is set. Since we cannot check for both permissions at once, we do it twice. It is worth noting that we don't quite handle the PAN3 case correctly (because we can't retrieve the *execution* property using AT). I'll add that to the list of stuff to fix. > > > + if (fail) { > > + vcpu_write_sys_reg(vcpu, SYS_PAR_EL1_F, PAR_EL1); > > + goto nopan; > > + } > > [..] > > + if (par & SYS_PAR_EL1_F) { > > + u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par); > > + > > + /* > > + * If we get something other than a permission fault, we > > + * need to retry, as we're likely to have missed in the PTs. > > + */ > > + if ((fst & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_PERM) > > + retry_slow = true; > > + } else { > > + /* > > + * The EL0 access succeded, but we don't have the full > > + * syndrom information to synthetize the failure. Go slow. > > + */ > > + retry_slow = true; > > + } > > This is what PSTATE.PAN controls: > > If the Effective value of PSTATE.PAN is 1, then a privileged data access from > any of the following Exception levels to a virtual memory address that is > accessible to data accesses at EL0 generates a stage 1 Permission fault: > > - A privileged data access from EL1. > - If HCR_EL2.E2H is 1, then a privileged data access from EL2. > > With that in mind, I am really struggling to understand the logic. I don't quite see what you don't understand, you'll have to be more precise. Are you worried about the page tables we're looking at, the value of PSTATE.PAN, the permission fault, or something else? It also doesn't help that you're looking at the patch that contains the integration with the slow-path, which is pretty hard to read (I have a reworked version that's a bit better). You probably want to look at the "fast" path alone. > > If AT S1E0{R,W} (from check_at_pan()) failed, doesn't that mean that the virtual > memory address is not accessible to EL0? Add that to the fact that the AT > S1E1{R,W} (from the beginning of __kvm_at_s1e01()) succeeded, doesn't that mean > that AT S1E1{R,W}P should succeed, and furthermore the PAR_EL1 value should be > the one KVM got from AT S1E1{R,W}? There are plenty of ways for AT S1E0 to fail when AT S1E1 succeeded: - no EL0 permission: that's the best case, and the PAR_EL1 obtained from the AT S1E1 is the correct one. That's what we return. - The EL0 access failed, but for another reason than a permission fault. This contradicts the EL1 walk, and is a sure sign that someone is playing behind our back. We fail. - exception from AT S1E0: something went wrong (again the guest playing with the PTs behind our back). We fail as well. Do you at least agree with these as goals? If you do, what in the implementation does not satisfy these goals? If you don't, what in these goals seem improper to you? Thanks, M. -- Without deviation from the norm, progress is not possible.