On Wed, 5 Apr 2023 11:16:18 -0700 Isaku Yamahata <isaku.yamahata@xxxxxxxxx> wrote: > On Sun, Apr 02, 2023 at 11:41:58AM +0300, > Zhi Wang <zhi.wang.linux@xxxxxxxxx> wrote: > > > > > > +void tdx_mmu_release_hkid(struct kvm *kvm) > > > > > +{ > > > > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > > > > > + cpumask_var_t packages; > > > > > + bool cpumask_allocated; > > > > > + u64 err; > > > > > + int ret; > > > > > + int i; > > > > > + > > > > > + if (!is_hkid_assigned(kvm_tdx)) > > > > > + return; > > > > > + > > > > > + if (!is_td_created(kvm_tdx)) > > > > > + goto free_hkid; > > > > > + > > > > > + cpumask_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL); > > > > > + cpus_read_lock(); > > > > > + for_each_online_cpu(i) { > > > > > + if (cpumask_allocated && > > > > > + cpumask_test_and_set_cpu(topology_physical_package_id(i), > > > > > + packages)) > > > > > + continue; > > > > > > > > Is this necessary to check cpumask_allocated in the while loop? if cpumask > > > > is not succefully allocated, wouldn't it be better to bail out just after > > > > it? > > > > > > No because we can't return error here. It's better to do in-efficiently freeing > > > resources instead of leak. > > > > > > We can move the check out of loop. But it would be ugly > > > (if () {cpu loop} else {cpu loop} ) and this function isn't performance > > > critical. Also I think it's okay to depend on compiler optimization for loop > > > invariant. My compiler didn't optimize it in this case, though. > > > > > > > Do you mean the tdh_mng_key_freeid() is still required if failing to allocate > > the cpumask var and do TDH.PHYMEM_CACHE_WB(WBINVD) on each CPU? > > > > > Out of curiosity, I took a look on the TDX module source code [1], it seems TDX > > module has an additional check in TDH.MNG.KEY.FREEID. TDH.MNG.VPFLUSHDONE [2] > > will mark the pending wbinvd in a bitmap: > > > > ... > > /** > > * Create the WBINVD_BITMAP per-package. > > * Set to 1 num_of_pkgs bits from the LSB > > */ > > global_data_ptr->kot.entries[curr_hkid].wbinvd_bitmap = global_data_ptr->pkg_config_bitmap; /* <----HERE */ > > > > // Set new TD life cycle state > > tdr_ptr->management_fields.lifecycle_state = TD_BLOCKED; > > > > // Set the proper new KOT entry state > > global_data_ptr->kot.entries[curr_hkid].state = (uint8_t)KOT_STATE_HKID_FLUSHED; > > ... > > > > And TDH.MNG.KEY.FREEID [3] will check if the pending WBINVD has been performed: > > > > ... > > /** > > * If TDH_PHYMEM_CACHE_WB was executed on all packages/cores, > > * set the KOT entry, set the KOT entry state to HKID_FREE. > > */ > > curr_hkid = tdr_ptr->key_management_fields.hkid; > > tdx_debug_assert(global_data_ptr->kot.entr/ies[curr_hkid].state == KOT_STATE_HKID_FLUSHED); > > if (global_data_ptr->kot.entries[curr_hkid].wbinvd_bitmap != 0) /* HERE */ > > { > > TDX_ERROR("CACHEWB is not complete for this HKID (=%x)\n", curr_hkid); > > return_val = TDX_WBCACHE_NOT_COMPLETE; > > goto EXIT; > > } > > ... > > > > Guess the conclusion is: if TDH.PHYMEM.CACHE.WB is not performed on each > > required CPU correctly, TDH.MNG.KEY.FREEID will fail as well. A leak seems > > the only option (none of us likes a leak, but...). > > Why do we need to leak key? If we fails to allocate cpumask, we can issue > TDH.PHYMEM.CACHE.WB on all pCPUs instead of all packages. > If we call TDH.PHYMEM.CACHE.WB multiple times on the same package, it may return > error. It's benign. It is suboptimal, but it's much better than leaking hkid. I guess I misunderstood the following sentence in the previous email. Now I get it. It is a combination of failure-resolving and normal-resolving. > > We can move the check out of loop. But it would be ugly > > (if () {cpu loop} else {cpu loop} ) and this function isn't performance > > critical.