Re: [PATCH 10/12] KVM: arm64: nv: Add SW walker for AT S1 emulation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 11 Jul 2024 11:56:13 +0100,
Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote:
> 
> Hi,
> 
> On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote:
> > In order to plug the brokenness of our current AT implementation,
> > we need a SW walker that is going to... err.. walk the S1 tables
> > and tell us what it finds.
> > 
> > Of course, it builds on top of our S2 walker, and share similar
> > concepts. The beauty of it is that since it uses kvm_read_guest(),
> > it is able to bring back pages that have been otherwise evicted.
> > 
> > This is then plugged in the two AT S1 emulation functions as
> > a "slow path" fallback. I'm not sure it is that slow, but hey.
> > [..]
> >  	switch (op) {
> >  	case OP_AT_S1E1RP:
> >  	case OP_AT_S1E1WP:
> > +		retry_slow = false;
> >  		fail = check_at_pan(vcpu, vaddr, &par);
> >  		break;
> >  	default:
> >  		goto nopan;
> >  	}
> 
> For context, this is what check_at_pan() does:
> 
> static int check_at_pan(struct kvm_vcpu *vcpu, u64 vaddr, u64 *res)
> {
>         u64 par_e0;
>         int error;
> 
>         /*
>          * For PAN-involved AT operations, perform the same translation,
>          * using EL0 this time. Twice. Much fun.
>          */
>         error = __kvm_at(OP_AT_S1E0R, vaddr);
>         if (error)
>                 return error;
> 
>         par_e0 = read_sysreg_par();
>         if (!(par_e0 & SYS_PAR_EL1_F))
>                 goto out;
> 
>         error = __kvm_at(OP_AT_S1E0W, vaddr);
>         if (error)
>                 return error;
> 
>         par_e0 = read_sysreg_par();
> out:
>         *res = par_e0;
>         return 0;
> }
> 
> I'm having a hard time understanding why KVM is doing both AT S1E0R and AT S1E0W
> regardless of the type of the access (read/write) in the PAN-aware AT
> instruction. Would you mind elaborating on that?

Because that's the very definition of an AT S1E1{W,R}P instruction
when PAN is set. If *any* EL0 permission is set, then the translation
must equally fail. Just like a load or a store from EL1 would fail if
any EL0 permission is set when PSTATE.PAN is set.

Since we cannot check for both permissions at once, we do it twice.
It is worth noting that we don't quite handle the PAN3 case correctly
(because we can't retrieve the *execution* property using AT). I'll
add that to the list of stuff to fix.

> 
> > +	if (fail) {
> > +		vcpu_write_sys_reg(vcpu, SYS_PAR_EL1_F, PAR_EL1);
> > +		goto nopan;
> > +	}
> > [..]
> > +	if (par & SYS_PAR_EL1_F) {
> > +		u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par);
> > +
> > +		/*
> > +		 * If we get something other than a permission fault, we
> > +		 * need to retry, as we're likely to have missed in the PTs.
> > +		 */
> > +		if ((fst & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_PERM)
> > +			retry_slow = true;
> > +	} else {
> > +		/*
> > +		 * The EL0 access succeded, but we don't have the full
> > +		 * syndrom information to synthetize the failure. Go slow.
> > +		 */
> > +		retry_slow = true;
> > +	}
> 
> This is what PSTATE.PAN controls:
> 
> If the Effective value of PSTATE.PAN is 1, then a privileged data access from
> any of the following Exception levels to a virtual memory address that is
> accessible to data accesses at EL0 generates a stage 1 Permission fault:
> 
> - A privileged data access from EL1.
> - If HCR_EL2.E2H is 1, then a privileged data access from EL2.
> 
> With that in mind, I am really struggling to understand the logic.

I don't quite see what you don't understand, you'll have to be more
precise. Are you worried about the page tables we're looking at, the
value of PSTATE.PAN, the permission fault, or something else?

It also doesn't help that you're looking at the patch that contains
the integration with the slow-path, which is pretty hard to read (I
have a reworked version that's a bit better). You probably want to
look at the "fast" path alone.

> 
> If AT S1E0{R,W} (from check_at_pan()) failed, doesn't that mean that the virtual
> memory address is not accessible to EL0? Add that to the fact that the AT
> S1E1{R,W} (from the beginning of __kvm_at_s1e01()) succeeded, doesn't that mean
> that AT S1E1{R,W}P should succeed, and furthermore the PAR_EL1 value should be
> the one KVM got from AT S1E1{R,W}?

There are plenty of ways for AT S1E0 to fail when AT S1E1 succeeded:

- no EL0 permission: that's the best case, and the PAR_EL1 obtained
  from the AT S1E1 is the correct one. That's what we return.

- The EL0 access failed, but for another reason than a permission
  fault. This contradicts the EL1 walk, and is a sure sign that
  someone is playing behind our back. We fail.

- exception from AT S1E0: something went wrong (again the guest
  playing with the PTs behind our back). We fail as well.

Do you at least agree with these as goals? If you do, what in
the implementation does not satisfy these goals? If you don't, what in
these goals seem improper to you?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux