On Thu, Jan 19, 2023 at 10:37:43AM +0000, "Huang, Kai" <kai.huang@xxxxxxxxx> wrote: > On Thu, 2023-01-12 at 08:31 -0800, isaku.yamahata@xxxxxxxxx wrote: > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > > > TD guest vcpu need to be configured before ready to run which requests > > addtional information from Device model (e.g. qemu), one 64bit value is > > passed to vcpu's RCX as an initial value. > > > > The first half sentence doesn't parse to me. It also has grammar issue. > > Also, the second half only talks about TDH.VP.INIT, but there's more regarding > to creating/initializing a TDX guest vcpu. IMHO It would be better if you can > briefly describe the whole sequence here so people can get some idea about your > code below. > > Btw, I don't understand what's the point of pointing out "64bit value passed to > vcpu's RCX ...". You can add this to the comment instead. If it is important, > then please add more to explain it so people can understand more. > RCX and 64bit value doesn't make much sense in the commit message. I dropped those sentence. Here is the updated commit message. KVM: TDX: Do TDX specific vcpu initialization TD guest vcpu needs TDX specific initialization before running. Repurpose KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command KVM_TDX_INIT_VCPU, and implement the callback for it. > > Repurpose KVM_MEMORY_ENCRYPT_OP > > to vcpu-scope and add new sub-commands KVM_TDX_INIT_VCPU under it for such > > additional vcpu configuration. > > I am not sure using the same command for both per-VM and per-vcpu ioctls is a > good idea. Is there any existing example does this? There are some. Please break Documentation/virt/kvm/api.rst with "type:". You can see some ioctl supports multiple type. Type: system ioctl, vm ioctl Type: vcpu ioctl / vm ioctl Type: device ioctl, vm ioctl, vcpu ioctl > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > index 099f0737a5aa..e2f5a07ad4e5 100644 > > --- a/arch/x86/kvm/vmx/tdx.c > > +++ b/arch/x86/kvm/vmx/tdx.c > > @@ -49,6 +49,11 @@ static __always_inline hpa_t set_hkid_to_hpa(hpa_t pa, u16 hkid) > > return pa | ((hpa_t)hkid << boot_cpu_data.x86_phys_bits); > > } > > > > +static inline bool is_td_vcpu_created(struct vcpu_tdx *tdx) > > +{ > > + return tdx->tdvpr_pa; > > +} > > + > > static inline bool is_td_created(struct kvm_tdx *kvm_tdx) > > { > > return kvm_tdx->tdr_pa; > > @@ -65,6 +70,11 @@ static inline bool is_hkid_assigned(struct kvm_tdx *kvm_tdx) > > return kvm_tdx->hkid > 0; > > } > > > > +static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx) > > +{ > > + return kvm_tdx->finalized; > > +} > > + > > static void tdx_clear_page(unsigned long page_pa) > > { > > const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0))); > > @@ -327,7 +337,21 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) > > > > void tdx_vcpu_free(struct kvm_vcpu *vcpu) > > { > > - /* This is stub for now. More logic will come. */ > > + struct vcpu_tdx *tdx = to_tdx(vcpu); > > + int i; > > + > > + /* Can't reclaim or free pages if teardown failed. */ > > + if (is_hkid_assigned(to_kvm_tdx(vcpu->kvm))) > > + return; > > You may want to WARN() if it's a kernel bug you want to catch. No, it's not a bug to come here with hkid freed because vcpus_free method can be called after vm destruction. > > + > > + if (tdx->tdvpx_pa) { > > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) > > + tdx_reclaim_td_page(tdx->tdvpx_pa[i]); > > + kfree(tdx->tdvpx_pa); > > + tdx->tdvpx_pa = NULL; > > + } > > + tdx_reclaim_td_page(tdx->tdvpr_pa); > > + tdx->tdvpr_pa = 0; > > } > > > > void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > > @@ -337,6 +361,8 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > > /* TDX doesn't support INIT event. */ > > if (WARN_ON_ONCE(init_event)) > > goto td_bugged; > > + if (WARN_ON_ONCE(is_td_vcpu_created(to_tdx(vcpu)))) > > + goto td_bugged; > > Again, not sure can we use KVM_BUG_ON()? I converted it into KVM_BUG_ON() > > /* TDX rquires X2APIC. */ > > apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC; > > @@ -791,6 +817,125 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) > > return r; > > } > > > > +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx) > > +{ > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); > > + struct vcpu_tdx *tdx = to_tdx(vcpu); > > + unsigned long *tdvpx_pa = NULL; > > + unsigned long tdvpr_pa; > > + unsigned long va; > > + int ret, i; > > + u64 err; > > + > > + if (is_td_vcpu_created(tdx)) > > + return -EINVAL; > > Ditto. WARN()? No. KVM_TDX_INIT_VCPU can be called multiple times. It's not kernel bug, but misuse of the ioctl. > > + > > + va = __get_free_page(GFP_KERNEL_ACCOUNT); > > + if (!va) > > + return -ENOMEM; > > + tdvpr_pa = __pa(va); > > + > > + tdvpx_pa = kcalloc(tdx_caps.tdvpx_nr_pages, sizeof(*tdx->tdvpx_pa), > > + GFP_KERNEL_ACCOUNT | __GFP_ZERO); > > kcalloc() uses __GFP_ZERO internally. > > > + if (!tdvpx_pa) { > > + ret = -ENOMEM; > > + goto free_tdvpr; > > + } > > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) { > > + va = __get_free_page(GFP_KERNEL_ACCOUNT); > > + if (!va) > > + goto free_tdvpx; > > + tdvpx_pa[i] = __pa(va); > > + } > > + > > + err = tdh_vp_create(kvm_tdx->tdr_pa, tdvpr_pa); > > + if (WARN_ON_ONCE(err)) { > > + ret = -EIO; > > + pr_tdx_error(TDH_VP_CREATE, err, NULL); > > + goto td_bugged_free_tdvpx; > > + } > > + tdx->tdvpr_pa = tdvpr_pa; > > + > > + tdx->tdvpx_pa = tdvpx_pa; > > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) { > > + err = tdh_vp_addcx(tdx->tdvpr_pa, tdvpx_pa[i]); > > + if (WARN_ON_ONCE(err)) { > > + ret = -EIO; > > + pr_tdx_error(TDH_VP_ADDCX, err, NULL); > > + for (; i < tdx_caps.tdvpx_nr_pages; i++) { > > + free_page((unsigned long)__va(tdvpx_pa[i])); > > + tdvpx_pa[i] = 0; > > + } > > + goto td_bugged; > > + } > > + } > > + > > + err = tdh_vp_init(tdx->tdvpr_pa, vcpu_rcx); > > + if (WARN_ON_ONCE(err)) { > > + ret = -EIO; > > + pr_tdx_error(TDH_VP_INIT, err, NULL); > > + goto td_bugged; > > + } > > + > > + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; > > + > > + return 0; > > + > > +td_bugged_free_tdvpx: > > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) { > > + free_page((unsigned long)__va(tdvpx_pa[i])); > > + tdvpx_pa[i] = 0; > > + } > > + kfree(tdvpx_pa); > > +td_bugged: > > + vcpu->kvm->vm_bugged = true; > > + return ret; > > + > > +free_tdvpx: > > + for (i = 0; i < tdx_caps.tdvpx_nr_pages; i++) > > + if (tdvpx_pa[i]) > > + free_page((unsigned long)__va(tdvpx_pa[i])); > > + kfree(tdvpx_pa); > > This piece of code appears 3 times in this function (and there are 3 'return > ret;'). I am sure it can be done in one place instead. Can you reorganize? I improved the logic to delete the duplication. -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>