On Thu, Apr 25, 2024, Paolo Bonzini wrote: > On Thu, Apr 25, 2024 at 3:12 AM Isaku Yamahata <isaku.yamahata@xxxxxxxxx> wrote: > > > > get_user_pages_fast(source addr) > > > > read_lock(mmu_lock) > > > > kvm_tdp_mmu_get_walk_private_pfn(vcpu, gpa, &pfn); > > > > if the page table doesn't map gpa, error. > > > > TDH.MEM.PAGE.ADD() > > > > TDH.MR.EXTEND() > > > > read_unlock(mmu_lock) > > > > put_page() > > > > > > Hmm, KVM doesn't _need_ to use invalidate_lock to protect against guest_memfd > > > invalidation, but I also don't see why it would cause problems. > > The invalidate_lock is only needed to operate on the guest_memfd, but > it's a rwsem so there are no risks of lock inversion. > > > > I.e. why not > > > take mmu_lock() in TDX's post_populate() implementation? > > > > We can take the lock. Because we have already populated the GFN of guest_memfd, > > we need to make kvm_gmem_populate() not pass FGP_CREAT_ONLY. Otherwise we'll > > get -EEXIST. > > I don't understand why TDH.MEM.PAGE.ADD() cannot be called from the > post-populate hook. Can the code for TDH.MEM.PAGE.ADD be shared > between the memory initialization ioctl and the page fault hook in > kvm_x86_ops? Ah, because TDX is required to pre-fault the memory to establish the S-EPT walk, and pre-faulting means guest_memfd() Requiring that guest_memfd not have a page when initializing the guest image seems wrong, i.e. I don't think we want FGP_CREAT_ONLY. And not just because I am a fan of pre-faulting, I think the semantics are bad. E.g. IIUC, doing fallocate() to ensure memory is available would cause LAUNCH_UPDATE to fail. That's weird and has nothing to do with KVM_PRE_FAULT. I don't understand why we want FGP_CREAT_ONLY semantics. Who cares if there's a page allocated? KVM already checks that the page is unassigned in the RMP, so why does guest_memfd care whether or not the page was _just_ allocated? AFAIK, unwinding on failure is completely uninteresting, and arguably undesirable, because undoing LAUNCH_UPDATE or PAGE.ADD will affect the measurement, i.e. there is no scenario where deleting pages from guest_memfd would allow a restart/resume of the build process to truly succeed.