Re: [PATCH v11 024/113] KVM: TDX: Do TDX specific vcpu initialization

Isaku Yamahata <isaku.yamahata@xxxxxxxxx> · Tue, 28 Feb 2023 03:27:16 -0800

On Thu, Jan 19, 2023 at 10:37:43AM +0000,
"Huang, Kai" <kai.huang@xxxxxxxxx> wrote:

> On Thu, 2023-01-12 at 08:31 -0800, isaku.yamahata@xxxxxxxxx wrote:
> > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> > 
> > TD guest vcpu need to be configured before ready to run which requests
> > addtional information from Device model (e.g. qemu), one 64bit value is
> > passed to vcpu's RCX as an initial value.  
> > 
> 
> The first half sentence doesn't parse to me.  It also has grammar issue.
> 
> Also, the second half only talks about TDH.VP.INIT, but there's more regarding
> to creating/initializing a TDX guest vcpu.  IMHO It would be better if you can
> briefly describe the whole sequence here so people can get some idea about your
> code below.
> 
> Btw, I don't understand what's the point of pointing out "64bit value passed to
> vcpu's RCX ...".  You can add this to the comment instead.  If it is important,
> then please add more to explain it so people can understand more. 
> 

RCX and 64bit value doesn't make much sense in the commit message. I dropped
those sentence.

Here is the updated commit message.

    KVM: TDX: Do TDX specific vcpu initialization

    TD guest vcpu needs TDX specific initialization before running.  Repurpose
    KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command
    KVM_TDX_INIT_VCPU, and implement the callback for it.

> > Repurpose KVM_MEMORY_ENCRYPT_OP
> > to vcpu-scope and add new sub-commands KVM_TDX_INIT_VCPU under it for such
> > additional vcpu configuration.
> 
> I am not sure using the same command for both per-VM and per-vcpu ioctls is a
> good idea.  Is there any existing example does this?

There are some. Please break Documentation/virt/kvm/api.rst with "type:".
You can see some ioctl supports multiple type.

  Type: system ioctl, vm ioctl
  Type: vcpu ioctl / vm ioctl
  Type: device ioctl, vm ioctl, vcpu ioctl

> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 099f0737a5aa..e2f5a07ad4e5 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -49,6 +49,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid)
> >  	return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits);
> >  }
> >  
> > +static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx)
> > +{
> > +	return tdx->tdvpr_pa;
> > +}
> > +
> >  static inline bool is_td_created(struct kvm_tdx *kvm_tdx)
> >  {
> >  	return kvm_tdx->tdr_pa;
> > @@ -65,6 +70,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx)
> >  	return kvm_tdx->hkid > 0;
> >  }
> >  
> > +static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
> > +{
> > +	return kvm_tdx->finalized;
> > +}
> > +
> >  static void tdx_clear_page(unsigned long page_pa)
> >  {
> >  	const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
> > @@ -327,7 +337,21 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> >  
> >  void tdx_vcpu_free(struct kvm_vcpu *vcpu)
> >  {
> > -	/* This is stub for now.  More logic will come. */
> > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> > +	int i;
> > +
> > +	/* Can't reclaim or free pages if teardown failed. */
> > +	if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm)))
> > +		return;
> 
> You may want to WARN() if it's a kernel bug you want to catch.

No, it's not a bug to come here with hkid freed because vcpus_free method
can be called after vm destruction.

> > +
> > +	if (tdx->tdvpx_pa) {
> > +		for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++)
> > +			tdx_reclaim_td_page(tdx->tdvpx_pa[i]);
> > +		kfree(tdx->tdvpx_pa);
> > +		tdx->tdvpx_pa = NULL;
> > +	}
> > +	tdx_reclaim_td_page(tdx->tdvpr_pa);
> > +	tdx->tdvpr_pa = 0;
> >  }
> >  
> >  void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > @@ -337,6 +361,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> >  	/* TDX doesn't support INIT event. */
> >  	if (WARN_ON_ONCE(init_event))
> >  		goto td_bugged;
> > +	if (WARN_ON_ONCE(is_td_vcpu_created(to_tdx(vcpu))))
> > +		goto td_bugged;
> 
> Again, not sure can we use KVM_BUG_ON()?

I converted it into KVM_BUG_ON()

> >  	/* TDX rquires X2APIC. */
> >  	apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
> > @@ -791,6 +817,125 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp)
> >  	return r;
> >  }
> >  
> > +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
> > +{
> > +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> > +	struct vcpu_tdx *tdx = to_tdx(vcpu);
> > +	unsigned long *tdvpx_pa = NULL;
> > +	unsigned long tdvpr_pa;
> > +	unsigned long va;
> > +	int ret, i;
> > +	u64 err;
> > +
> > +	if (is_td_vcpu_created(tdx))
> > +		return -EINVAL;
> 
> Ditto.  WARN()?

No. KVM_TDX_INIT_VCPU can be called multiple times.  It's not kernel bug, but
misuse of the ioctl.

> > +
> > +	va = __get_free_page(GFP_KERNEL_ACCOUNT);
> > +	if (!va)
> > +		return -ENOMEM;
> > +	tdvpr_pa = __pa(va);
> > +
> > +	tdvpx_pa = kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx_pa),
> > +			   GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> 
> kcalloc() uses __GFP_ZERO internally.
> 
> > +	if (!tdvpx_pa) {
> > +		ret = -ENOMEM;
> > +		goto free_tdvpr;
> > +	}
> > +	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> > +		va = __get_free_page(GFP_KERNEL_ACCOUNT);
> > +		if (!va)
> > +			goto free_tdvpx;
> > +		tdvpx_pa[i] = __pa(va);
> > +	}
> > +
> > +	err = tdh_vp_create(kvm_tdx->tdr_pa, tdvpr_pa);
> > +	if (WARN_ON_ONCE(err)) {
> > +		ret = -EIO;
> > +		pr_tdx_error(TDH_VP_CREATE, err, NULL);
> > +		goto td_bugged_free_tdvpx;
> > +	}
> > +	tdx->tdvpr_pa = tdvpr_pa;
> > +
> > +	tdx->tdvpx_pa = tdvpx_pa;
> > +	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> > +		err = tdh_vp_addcx(tdx->tdvpr_pa, tdvpx_pa[i]);
> > +		if (WARN_ON_ONCE(err)) {
> > +			ret = -EIO;
> > +			pr_tdx_error(TDH_VP_ADDCX, err, NULL);
> > +			for (; i < tdx_caps.tdvpx_nr_pages; i++) {
> > +				free_page((unsigned long)__va(tdvpx_pa[i]));
> > +				tdvpx_pa[i] = 0;
> > +			}
> > +			goto td_bugged;
> > +		}
> > +	}
> > +
> > +	err = tdh_vp_init(tdx->tdvpr_pa, vcpu_rcx);
> > +	if (WARN_ON_ONCE(err)) {
> > +		ret = -EIO;
> > +		pr_tdx_error(TDH_VP_INIT, err, NULL);
> > +		goto td_bugged;
> > +	}
> > +
> > +	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> > +
> > +	return 0;
> > +
> > +td_bugged_free_tdvpx:
> > +	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) {
> > +		free_page((unsigned long)__va(tdvpx_pa[i]));
> > +		tdvpx_pa[i] = 0;
> > +	}
> > +	kfree(tdvpx_pa);
> > +td_bugged:
> > +	vcpu->kvm->vm_bugged = true;
> > +	return ret;
> > +
> > +free_tdvpx:
> > +	for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++)
> > +		if (tdvpx_pa[i])
> > +			free_page((unsigned long)__va(tdvpx_pa[i]));
> > +	kfree(tdvpx_pa);
> 
> This piece of code appears 3 times in this function (and there are 3 'return
> ret;').  I am sure it can be done in one place instead. Can you reorganize?

I improved the logic to delete the duplication.
-- 
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>