Re: [PATCH v8 020/103] KVM: TDX: create/destroy VM structure

Isaku Yamahata <isaku.yamahata@xxxxxxxxx> · Mon, 29 Aug 2022 12:09:21 -0700

On Sat, Aug 27, 2022 at 11:52:39AM +0800,
Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> wrote:

> > +static void tdx_clear_page(unsigned long page)
> > +{
> > +	const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
> > +	unsigned long i;
> > +
> > +	/*
> > +	 * Zeroing the page is only necessary for systems with MKTME-i:
> > +	 * when re-assign one page from old keyid to a new keyid, MOVDIR64B is
> > +	 * required to clear/write the page with new keyid to prevent integrity
> > +	 * error when read on the page with new keyid.
> > +	 */
> > +	if (!static_cpu_has(X86_FEATURE_MOVDIR64B))
> > +		return;
> 
> TDX relies on MKTME, and MOVDIR64B is a must have feature. The check should
> not fail at this point?
> 
> It feels a bit strange to check the feature here and return siliently if the
> check failed.

Makes sense. This code is carried from the early devel phase.  I'll move this
check to tdx module initialization.

> > +
> > +	for (i = 0; i < 4096; i += 64)
> > +		/* MOVDIR64B [rdx], es:rdi */
> > +		asm (".byte 0x66, 0x0f, 0x38, 0xf8, 0x3a"
> > +		     : : "d" (zero_page), "D" (page + i) : "memory");
> 
> There is already have a inline function movdir64b defined in
> arch/x86/include/asm/special_insns.h, can we use it directly here?

Sure I'll use the function.

> > +}
> > +
> > +static int tdx_reclaim_page(unsigned long va, hpa_t pa, bool do_wb, u16 hkid)
> > +{
> > +	struct tdx_module_output out;
> > +	u64 err;
> > +
> > +	err = tdh_phymem_page_reclaim(pa, &out);
> > +	if (WARN_ON_ONCE(err)) {
> > +		pr_tdx_error(TDH_PHYMEM_PAGE_RECLAIM, err, &out);
> > +		return -EIO;
> > +	}
> > +
> > +	if (do_wb) {
> > +		err = tdh_phymem_page_wbinvd(set_hkid_to_hpa(pa, hkid));
> > +		if (WARN_ON_ONCE(err)) {
> > +			pr_tdx_error(TDH_PHYMEM_PAGE_WBINVD, err, NULL);
> > +			return -EIO;
> > +		}
> > +	}
> > +
> > +	tdx_clear_page(va);
> 
> Is it really necessary to clear the reclaimed page using MOVDIR64?
> 
> According to the TDX module spec,  when add a page to TD, both for control
> structures and TD private memory, during the process some function of the
> TDX module will initialize the page using binding hkid and direct write
> (MOVDIR64B).
> 
> So still need to clear the page using direct write to avoid integrity error
> when re-assign one page from old keyid to a new keyid as you mentioned in
> the comment?

Yes. As you described above, TDX module does when assining a page to a private
hkid. i.e. TDH.MEM.PAGE.{ADD, AUG}.  But when re-assigning a page from an old
private hkid to a new _shared_ hkid, i.e. TDH.MEM.PAGE.REMOVE or
TDH.PHYMEM.PAGE.RECLAIM, TDX module doesn't.

> > +void tdx_mmu_release_hkid(struct kvm *kvm)
> > +{
> > +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > +	cpumask_var_t packages;
> > +	bool cpumask_allocated;
> > +	u64 err;
> > +	int ret;
> > +	int i;
> > +
> > +	if (!is_hkid_assigned(kvm_tdx))
> > +		return;
> > +
> > +	if (!is_td_created(kvm_tdx))
> > +		goto free_hkid;
> > +
> > +	cpumask_allocated = zalloc_cpumask_var(&packages, GFP_KERNEL);
> > +	cpus_read_lock();
> > +	for_each_online_cpu(i) {
> > +		if (cpumask_allocated &&
> > +			cpumask_test_and_set_cpu(topology_physical_package_id(i),
> > +						packages))
> > +			continue;
> > +
> > +		/*
> > +		 * We can destroy multiple the guest TDs simultaneously.
> > +		 * Prevent tdh_phymem_cache_wb from returning TDX_BUSY by
> > +		 * serialization.
> > +		 */
> > +		mutex_lock(&tdx_lock);
> > +		ret = smp_call_on_cpu(i, tdx_do_tdh_phymem_cache_wb, NULL, 1);
> > +		mutex_unlock(&tdx_lock);
> > +		if (ret)
> > +			break;
> > +	}
> > +	cpus_read_unlock();
> > +	free_cpumask_var(packages);
> > +
> > +	mutex_lock(&tdx_lock);
> > +	err = tdh_mng_key_freeid(kvm_tdx->tdr.pa);
> 
> According to the TDX module spec, there is a API called
> TDH.MNG.KEY.RECLAIMID, which is used to put the TD in blocked state.
> 
> I didn't see the API used in the patch. Is it not used or did I miss
> something?

In the public spec of TDX module of 344425-004US June 2022, table 2.4 says
"TDH.MNG.KEY.RECLAIMID 27 Does nothing; provided for backward compatibility"

> > +static int tdx_do_tdh_mng_key_config(void *param)
> > +{
> > +	hpa_t *tdr_p = param;
> > +	u64 err;
> > +
> > +	do {
> > +		err = tdh_mng_key_config(*tdr_p);
> > +
> > +		/*
> > +		 * If it failed to generate a random key, retry it because this
> > +		 * is typically caused by an entropy error of the CPU's random
> > +		 * number generator.
> > +		 */
> > +	} while (err == TDX_KEY_GENERATION_FAILED);
> 
> Is there any corner case that could lead to deadloop?

The error happens due to the lack of entropy of pconfig instruction.  If the
entropy on the platform could be drained constantly somehow, the dead loop could
be possible.  I think it's very unlikely.
-- 
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>