On 3/7/2022 1:18 AM, Mingwei Zhang wrote: > On Tue, Jan 18, 2022, Nikunj A Dadhania wrote: >> Use the memslot metadata to store the pinned data along with the pfns. >> This improves the SEV guest startup time from O(n) to a constant by >> deferring guest page pinning until the pages are used to satisfy nested >> page faults. The page reference will be dropped in the memslot free >> path. >> >> Remove the enc_region structure definition and the code which did >> upfront pinning, as they are no longer needed in view of the demand >> pinning support. >> >> Leave svm_register_enc_region() and svm_unregister_enc_region() as stubs >> since qemu is dependent on this API. >> >> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx> >> --- >> + >> + /* Pin the page, KVM doesn't yet support page migration. */ >> + for (i = 0; i < npages; i++, rel_gfn++, pin_pfn++) { >> + if (test_bit(rel_gfn, aslot->pinned_bitmap)) { >> + old_pfn = aslot->pfns[rel_gfn]; >> + if (old_pfn == pin_pfn) >> + continue; >> + >> + put_page(pfn_to_page(old_pfn)); > > You need to flush the old pfn using VMPAGE_FLUSH before doing put_page. > Normally, this should not happen. But if the user-level VMM is > malicious, then it could just munmap() the region (not the memslot); > mmap() it again; let the guest VM touches the page and you will see this > path get executed. > > Clearly, this will slow down the faulting path if this happens. So, > alternatively, you can register a hook in mmu_notifier and shoot a flush > there according to the bitmap. Either way should work. > We can call sev_flush_guest_memory() before the put_page(). >> + } >> + >> + set_bit(rel_gfn, aslot->pinned_bitmap); >> + aslot->pfns[rel_gfn] = pin_pfn; >> + get_page(pfn_to_page(pin_pfn)); >> + } >> + >> + /* >> + * Flush any cached lines of the page being added since "ownership" of >> + * it will be transferred from the host to an encrypted guest. >> + */ >> + clflush_cache_range(__va(pfn << PAGE_SHIFT), page_level_size(level)); >> +} >> + >> void sev_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) >> { >> struct kvm_arch_memory_slot *aslot = &slot->arch; >> + kvm_pfn_t *pfns; >> + gfn_t gfn; >> + int i; >> >> if (!sev_guest(kvm)) >> return; >> >> + if (!aslot->pinned_bitmap || !slot->arch.pfns) >> + goto out; >> + >> + pfns = aslot->pfns; >> + >> + /* >> + * Iterate the memslot to find the pinned pfn using the bitmap and drop >> + * the pfn stored. >> + */ >> + for (i = 0, gfn = slot->base_gfn; i < slot->npages; i++, gfn++) { >> + if (test_and_clear_bit(i, aslot->pinned_bitmap)) { >> + if (WARN_ON(!pfns[i])) >> + continue; >> + >> + put_page(pfn_to_page(pfns[i])); > > Here, you get lucky that you don't have to flush the cache. However, > this is because sev_free_memslots is called after the > kvm_arch_destroy_vm, which flushes the cache system wise. I have added wbinvd_on_all_cpus() just before the iteration in my new version. Regards Nikunj