Re: [PATCH v2 00/24] KVM: arm64: Introduce pKVM shadow state at EL2

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 20 Jul 2022 21:17:00 +0000

On Wed, Jul 20, 2022, Will Deacon wrote:
> Hi Sean,
> 
> On Tue, Jul 19, 2022 at 04:11:32PM +0000, Sean Christopherson wrote:
> > Or maybe just "pkvm"?
> 
> I think the "hyp" part is useful to distinguish the pkvm code running at EL2
> from the pkvm code running at EL1. For example, we have a 'pkvm' member in
> 'struct kvm_arch' which is used by the _host_ at EL1.

Right, my suggestion was to rename that to pkvm_handle to avoid a direct conflict,
and then that naturally yields the "pkvm_handle => pkvm_vm" association.  Or are
you expecting to shove more stuff into the that "pkvm" struct?

> So I'd say either "pkvm_hyp" or "hyp" instead of "shadow". The latter is
> nice and short...

I 100% agree that differentating between EL1 and EL2 is important for functions,
structs and global variables, but I would argue it's not so important for fields
and local variables where the "owning" struct/function provides that context.  But
that's actually a partial argument for just using "hyp".

My concern with just using e.g. "kvm_hyp" is that, because non-pKVM nVHE also has
the host vs. hyp split, it could lead people to believe that "kvm_hyp" is also
used for the non-pKVM case.

So, what about a blend?  E.g. "struct pkvm_hyp_vcpu *hyp_vcpu".  That provides
the context that the struct is specific to the EL2 side of pKVM, most usage is
nice and short, and the "hyp" prefix avoids the ambiguity that a bare "pkvm" would
suffer for EL1 vs. EL2.

Doesn't look awful?

static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
{
	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
	int ret;

	host_vcpu = kern_hyp_va(host_vcpu);

	if (unlikely(is_protected_kvm_enabled())) {
		struct pkvm_hyp_vcpu *hyp_vcpu;
		struct kvm *host_kvm;

		host_kvm = kern_hyp_va(host_vcpu->kvm);

		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
					      host_vcpu->vcpu_idx);
		if (!hyp_vcpu) {
			ret = -EINVAL;
			goto out;
		}

		flush_pkvm_guest_state(hyp_vcpu);

		ret = __kvm_vcpu_run(shadow_vcpu);

		sync_pkvm_guest_state(hyp_vcpu);

		pkvm_put_hyp_vcpu(shadow_state);
	} else {
		/* The host is fully trusted, run its vCPU directly. */
		ret = __kvm_vcpu_run(host_vcpu);
	}

out:
	cpu_reg(host_ctxt, 1) =  ret;
}

> > I think that's especially viable if you do away with
> > kvm_shadow_vcpu_state.  As of this series at least, kvm_shadow_vcpu_state is
> > completely unnecessary.  kvm_vcpu.kvm can be used to get at the VM, and thus pKVM
> > state via container_of().  Then the host_vcpu can be retrieved by using the
> > vcpu_idx, e.g.
> > 
> > 	struct pkvm_vm *pkvm_vm = to_pkvm_vm(pkvm_vcpu->vm);
> > 	struct kvm_vcpu *host_vcpu;
> > 
> > 	host_vcpu = kvm_get_vcpu(pkvm_vm->host_vm, pkvm_vcpu->vcpu_idx);
> 
> Using container_of() here is neat; we can definitely go ahead with that
> change. However, looking at this in more detail with Fuad, removing
> 'struct kvm_shadow_vcpu_state' entirely isn't going to work:

> > struct kvm_vcpu *pkvm_vcpu_load(pkvm_handle_t handle, unsigned int vcpu_idx)
> > {
> > 	struct kvm_vpcu *pkvm_vcpu = NULL;
> > 	struct kvm *vm;
> > 
> > 	hyp_spin_lock(&pkvm_global_lock);
> > 	vm = pkvm_get_vm(handle);
> > 	if (!vm || atomic_read(&vm->online_vcpus) <= vcpu_idx)
> > 		goto unlock;
> > 
> > 	pkvm_vcpu = kvm_get_vcpu(vm, vcpu_idx);
> 
> kvm_get_vcpu() makes use of an xarray to hold the vCPUs pointers and this is
> really something which we cannot support at EL2 where, amongst other things,
> we do not have support for RCU. Consequently, we do need to keep our own
> mapping from the shad^H^H^H^Hhyp vCPU to the host vCPU.

Hmm, are there guardrails in place to prevent using "unsafe" fields from "struct kvm"
and "struct kvm_vcpu" at EL2?  If not, it seems like embedding the common structs
in the hyp/pkvm-specific structs is going bite us in the rear at some point.

Mostly out of curiosity, I assume the EL2 restriction only applies to nVHE mode?

And waaaay off topic, has anyone explored adding macro magic to generate wrappers
to (un)marshall registers to parameters/returns for the hyp functions?  E.g. it'd
be neat if you could make the code look like this without having to add a wrapper
for every function:

static int handle___kvm_vcpu_run(unsigned long __host_vcpu)
{
	struct kvm_vcpu *host_vcpu = kern_hyp_va(__host_vcpu);
	int ret;

	if (unlikely(is_protected_kvm_enabled())) {
		struct pkvm_hyp_vcpu *hyp_vcpu;
		struct kvm *host_kvm;

		host_kvm = kern_hyp_va(host_vcpu->kvm);

		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
					      host_vcpu->vcpu_idx);
		if (!hyp_vcpu)
			return -EINVAL;

		flush_hypervisor_state(hyp_vcpu);

		ret = __kvm_vcpu_run(shadow_vcpu);

		sync_hypervisor_state(hyp_vcpu);
		pkvm_put_hyp_vcpu(shadow_state);
	} else {
		/* The host is fully trusted, run its vCPU directly. */
		ret = __kvm_vcpu_run(host_vcpu);
	}
	return ret;
}