Re: [RFC PATCH v5 073/104] KVM: TDX: track LP tdx vcpu run and teardown vcpus on descroing the guest TD

Erdem Aktas <erdemaktas@xxxxxxxxxx> · Wed, 23 Mar 2022 13:17:45 -0700

So the tdh_vp_flush should always succeed while vm is being torn down.
Thanks Isaku for the explanation, and I think it would be great to add
the error message.

-Erdem

On Wed, Mar 23, 2022 at 12:08 PM Isaku Yamahata
<isaku.yamahata@xxxxxxxxx> wrote:
>
> On Tue, Mar 22, 2022 at 05:54:45PM -0700,
> Erdem Aktas <erdemaktas@xxxxxxxxxx> wrote:
>
> > On Fri, Mar 4, 2022 at 11:50 AM <isaku.yamahata@xxxxxxxxx> wrote:
> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index a6b1a8ce888d..690298fb99c7 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -48,6 +48,14 @@ struct tdx_capabilities tdx_caps;
> > >  static DEFINE_MUTEX(tdx_lock);
> > >  static struct mutex *tdx_mng_key_config_lock;
> > >
> > > +/*
> > > + * A per-CPU list of TD vCPUs associated with a given CPU.  Used when a CPU
> > > + * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS.
> > > + * Protected by interrupt mask.  This list is manipulated in process context
> > > + * of vcpu and IPI callback.  See tdx_flush_vp_on_cpu().
> > > + */
> > > +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus);
> > > +
> > >  static u64 hkid_mask __ro_after_init;
> > >  static u8 hkid_start_pos __ro_after_init;
> > >
> > > @@ -87,6 +95,8 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
> > >
> > >  static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
> > >  {
> > > +       list_del(&to_tdx(vcpu)->cpu_list);
> > > +
> > >         /*
> > >          * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1,
> > >          * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU
> > > @@ -97,6 +107,22 @@ static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
> > >         vcpu->cpu = -1;
> > >  }
> > >
> > > +void tdx_hardware_enable(void)
> > > +{
> > > +       INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, raw_smp_processor_id()));
> > > +}
> > > +
> > > +void tdx_hardware_disable(void)
> > > +{
> > > +       int cpu = raw_smp_processor_id();
> > > +       struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu);
> > > +       struct vcpu_tdx *tdx, *tmp;
> > > +
> > > +       /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */
> > > +       list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list)
> > > +               tdx_disassociate_vp(&tdx->vcpu);
> > > +}
> > > +
> > >  static void tdx_clear_page(unsigned long page)
> > >  {
> > >         const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
> > > @@ -230,9 +256,11 @@ void tdx_mmu_prezap(struct kvm *kvm)
> > >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > >         cpumask_var_t packages;
> > >         bool cpumask_allocated;
> > > +       struct kvm_vcpu *vcpu;
> > >         u64 err;
> > >         int ret;
> > >         int i;
> > > +       unsigned long j;
> > >
> > >         if (!is_hkid_assigned(kvm_tdx))
> > >                 return;
> > > @@ -248,6 +276,17 @@ void tdx_mmu_prezap(struct kvm *kvm)
> > >                 return;
> > >         }
> > >
> > > +       kvm_for_each_vcpu(j, vcpu, kvm)
> > > +               tdx_flush_vp_on_cpu(vcpu);
> > > +
> > > +       mutex_lock(&tdx_lock);
> > > +       err = tdh_mng_vpflushdone(kvm_tdx->tdr.pa);
> >
> > Hi Isaku,
>
> Hi.
>
>
> > I am wondering about the impact of the failures on these functions. Is
> > there any other function which recovers any failures here?
> > When I look at the tdx_flush_vp function, it seems like it can fail
> > due to task migration so tdx_flush_vp_on_cpu might also fail and if it
> > fails, tdh_mng_vpflushdone returns err. Since tdx_vm_teardown does not
> > return any error , how the VMM can free the keyid used in this TD.
> > Will they be forever in "used state"?
> > Also if tdx_vm_teardown fails, the kvm_tdx->hkid is never set to -1
> > which will prevent tdx_vcpu_free to free and reclaim the resources
> > allocated for the vcpu.
>
> mmu_prezap() is called via release callback of mmu notifier when the last mmu
> reference of this process is dropped.  It is after all kvm vcpu fd and kvm vm
> fd were closed.  vcpu will never run.  But we still hold kvm_vcpu structures.
> There is no race between tdh_vp_flush()/tdh_mng_vpflushdone() here and process
> migration.  tdh_vp_flush()/tdh_mng_vp_flushdone() should success.
>
> The cpuid check in tdx_flush_vp() is for vcpu_load() which may race with process
> migration.
>
> Anyway what if one of those TDX seamcalls fails?  HKID is leaked and will be
> never used because there is no good way to free and use HKID safely.  Such
> failure is due to unknown issue and probably a bug.
>
> One mitigation is to add pr_err() when HKID leak happens.  I'll add such message
> on next respin.
>
> thanks,
> --
> Isaku Yamahata <isaku.yamahata@xxxxxxxxx>