Re: [PATCH v13 019/113] KVM: TDX: create/destroy VM structure

Zhi Wang <zhi.wang.linux@xxxxxxxxx> · Thu, 6 Apr 2023 19:04:51 +0300

On Wed, 5 Apr 2023 11:16:18 -0700
Isaku Yamahata <isaku.yamahata@xxxxxxxxx> wrote:

> On Sun, Apr 02, 2023 at 11:41:58AM +0300,
> Zhi Wang <zhi.wang.linux@xxxxxxxxx> wrote:
> 
> > > > > +void tdx_mmu_release_hkid(struct kvm *kvm)
> > > > > +{
> > > > > +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > > > > +	cpumask_var_t packages;
> > > > > +	bool cpumask_allocated;
> > > > > +	u64 err;
> > > > > +	int ret;
> > > > > +	int i;
> > > > > +
> > > > > +	if (!is_hkid_assigned(kvm_tdx))
> > > > > +		return;
> > > > > +
> > > > > +	if (!is_td_created(kvm_tdx))
> > > > > +		goto free_hkid;
> > > > > +
> > > > > +	cpumask_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL);
> > > > > +	cpus_read_lock();
> > > > > +	for_each_online_cpu(i) {
> > > > > +		if (cpumask_allocated &&
> > > > > +			cpumask_test_and_set_cpu(topology_physical_package_id(i),
> > > > > +						packages))
> > > > > +			continue;
> > > > 
> > > > Is this necessary to check cpumask_allocated in the while loop? if cpumask
> > > > is not succefully allocated, wouldn't it be better to bail out just after
> > > > it?
> > > 
> > > No because we can't return error here.  It's better to do in-efficiently freeing
> > > resources instead of leak.
> > > 
> > > We can move the check out of loop. But it would be ugly
> > > (if () {cpu loop} else {cpu loop} ) and this function isn't performance
> > > critical.  Also I think it's okay to depend on compiler optimization for loop
> > > invariant. My compiler didn't optimize it in this case, though.
> > > 
> > 
> > Do you mean the tdh_mng_key_freeid() is still required if failing to allocate
> > the cpumask var and do TDH.PHYMEM_CACHE_WB(WBINVD) on each CPU?
> 
> > 
> > Out of curiosity, I took a look on the TDX module source code [1], it seems TDX
> > module has an additional check in TDH.MNG.KEY.FREEID. TDH.MNG.VPFLUSHDONE [2]
> > will mark the pending wbinvd in a bitmap:
> > 
> > ...
> > /**
> >      * Create the WBINVD_BITMAP per-package.
> >      * Set to 1 num_of_pkgs bits from the LSB
> >      */
> >     global_data_ptr->kot.entries[curr_hkid].wbinvd_bitmap = global_data_ptr->pkg_config_bitmap; /* <----HERE */
> > 
> >     // Set new TD life cycle state
> >     tdr_ptr->management_fields.lifecycle_state = TD_BLOCKED;
> > 
> >     // Set the proper new KOT entry state
> >     global_data_ptr->kot.entries[curr_hkid].state = (uint8_t)KOT_STATE_HKID_FLUSHED; 
> > ...
> > 
> > And TDH.MNG.KEY.FREEID [3] will check if the pending WBINVD has been performed:
> > 
> > ...
> >     /**
> >      * If TDH_PHYMEM_CACHE_WB was executed on all packages/cores,
> >      * set the KOT entry, set the KOT entry state to HKID_FREE.
> >      */
> >     curr_hkid = tdr_ptr->key_management_fields.hkid;
> >     tdx_debug_assert(global_data_ptr->kot.entr/ies[curr_hkid].state == KOT_STATE_HKID_FLUSHED);
> >     if (global_data_ptr->kot.entries[curr_hkid].wbinvd_bitmap != 0) /* HERE */
> >     {
> >         TDX_ERROR("CACHEWB is not complete for this HKID (=%x)\n", curr_hkid);
> >         return_val = TDX_WBCACHE_NOT_COMPLETE;
> >         goto EXIT;
> >     }
> > ...
> > 
> > Guess the conclusion is: if TDH.PHYMEM.CACHE.WB is not performed on each
> > required CPU correctly, TDH.MNG.KEY.FREEID will fail as well. A leak seems
> > the only option (none of us likes a leak, but...).
> 
> Why do we need to leak key?  If we fails to allocate cpumask, we can issue
> TDH.PHYMEM.CACHE.WB on all pCPUs instead of all packages.
> If we call TDH.PHYMEM.CACHE.WB multiple times on the same package, it may return
> error. It's benign.  It is suboptimal, but it's much better than leaking hkid.

I guess I misunderstood the following sentence in the previous email. Now I
get it. It is a combination of failure-resolving and normal-resolving.

> > We can move the check out of loop. But it would be ugly
> > (if () {cpu loop} else {cpu loop} ) and this function isn't performance
> > critical.