On Tue, Dec 24, 2024 at 03:31:19PM +0100, Paolo Bonzini wrote: > On 11/12/24 08:38, Yan Zhao wrote: > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > > > Introduce a new VM-scoped KVM_MEMORY_ENCRYPT_OP IOCTL subcommand, > > KVM_TDX_FINALIZE_VM, to perform TD Measurement Finalization. > > > > The API documentation is provided in a separate patch: > > “Documentation/virt/kvm: Document on Trust Domain Extensions (TDX)”. > > > > Enhance TDX’s set_external_spte() hook to record the pre-mapping count > > instead of returning without action when the TD is not finalized. > > > > Adjust the pre-mapping count when pages are added or if the mapping is > > dropped. > > > > Set pre_fault_allowed to true after the finalization is complete. > > > > Note: TD Measurement Finalization is the process by which the initial state > > of the TDX VM is measured for attestation purposes. It uses the SEAMCALL > > TDH.MR.FINALIZE, after which: > > 1. The VMM can no longer add TD private pages with arbitrary content. > > 2. The TDX VM becomes runnable. > > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > Co-developed-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> > > Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> > > Co-developed-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > > --- > > TDX MMU part 2 v2 > > - Merge changes from patch "KVM: TDX: Premap initial guest memory" into > > this patch (Paolo) > > - Consolidate nr_premapped counting into this patch (Paolo) > > - Page level check should be (and is) in tdx_sept_set_private_spte() in > > patch "KVM: TDX: Implement hooks to propagate changes of TDP MMU mirror > > page table" not in tdx_mem_page_record_premap_cnt() (Paolo) > > - Protect finalization using kvm->slots_lock (Paolo) > > - Set kvm->arch.pre_fault_allowed to true after finalization is done > > (Paolo) > > - Add a memory barrier to ensure correct ordering of the updates to > > kvm_tdx->finalized and kvm->arch.pre_fault_allowed (Adrian) > > - pre_fault_allowed must not be true before finalization is done. > > Highlight that fact by checking it in tdx_mem_page_record_premap_cnt() > > (Adrian) > > - No need for is_td_finalized() (Rick) > > - Fixup SEAMCALL call sites due to function parameter changes to SEAMCALL > > wrappers (Kai) > > - Add nr_premapped where it's first used (Tao) > > I have just a couple imprecesions to note: > - stale reference to 'finalized' Thanks for catching that! > - atomic64_read WARN should block the following atomic64_dec (there is still > a small race window but it's not worth using a dec-if-not-zero operation) > - rename tdx_td_finalizemr to tdx_td_finalize Right. I didn't notice it because of the mr in tdh_mr_finalize(). :) > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index 61e4f126addd..eb0de85c3413 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -609,8 +609,8 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, > get_page(pfn_to_page(pfn)); > /* > - * To match ordering of 'finalized' and 'pre_fault_allowed' in > - * tdx_td_finalizemr(). > + * Read 'pre_fault_allowed' before 'kvm_tdx->state'; see matching > + * barrier in tdx_td_finalize(). > */ > smp_rmb(); > if (likely(kvm_tdx->state == TD_STATE_RUNNABLE)) > @@ -1397,7 +1397,7 @@ void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) > ept_sync_global(); > } > -static int tdx_td_finalizemr(struct kvm *kvm, struct kvm_tdx_cmd *cmd) > +static int tdx_td_finalize(struct kvm *kvm, struct kvm_tdx_cmd *cmd) > { > struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm); > @@ -1452,7 +1452,7 @@ int tdx_vm_ioctl(struct kvm *kvm, void __user *argp) > r = tdx_td_init(kvm, &tdx_cmd); > break; > case KVM_TDX_FINALIZE_VM: > - r = tdx_td_finalizemr(kvm, &tdx_cmd); > + r = tdx_td_finalize(kvm, &tdx_cmd); > break; > default: > r = -EINVAL; > @@ -1715,8 +1715,8 @@ static int tdx_gmem_post_populate(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, > goto out; > } > - WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped)); > - atomic64_dec(&kvm_tdx->nr_premapped); > + if (!WARN_ON_ONCE(!atomic64_read(&kvm_tdx->nr_premapped))) > + atomic64_dec(&kvm_tdx->nr_premapped); One concern here. If tdx_gmem_post_populate() is called when kvm_tdx->nr_premapped is 0, it will trigger the WARN_ON here, indicating that something has gone wrong. Should KVM refuse to start the TD in this case? If we don't decrease kvm_tdx->nr_premapped in that case, it will remain 0, allowing it to pass the check in tdx_td_finalize(). > if (arg->flags & KVM_TDX_MEASURE_MEMORY_REGION) { > for (i = 0; i < PAGE_SIZE; i += TDX_EXTENDMR_CHUNKSIZE) { >