On 3/21/2022 11:41 AM, Mingwei Zhang wrote: > On Wed, Mar 09, 2022, Nikunj A. Dadhania wrote: >> On 3/9/2022 3:23 AM, Mingwei Zhang wrote: >>> On Tue, Mar 08, 2022, Nikunj A Dadhania wrote: >>>> Use the memslot metadata to store the pinned data along with the pfns. >>>> This improves the SEV guest startup time from O(n) to a constant by >>>> deferring guest page pinning until the pages are used to satisfy >>>> nested page faults. The page reference will be dropped in the memslot >>>> free path or deallocation path. >>>> >>>> Reuse enc_region structure definition as pinned_region to maintain >>>> pages that are pinned outside of MMU demand pinning. Remove rest of >>>> the code which did upfront pinning, as they are no longer needed in >>>> view of the demand pinning support. >>> >>> I don't quite understand why we still need the enc_region. I have >>> several concerns. Details below. >> >> With patch 9 the enc_region is used only for memory that was pinned before >> the vcpu is online (i.e. mmu is not yet usable) >> >>>> >>>> Retain svm_register_enc_region() and svm_unregister_enc_region() with >>>> required checks for resource limit. >>>> >>>> Guest boot time comparison >>>> +---------------+----------------+-------------------+ >>>> | Guest Memory | baseline | Demand Pinning | >>>> | Size (GB) | (secs) | (secs) | >>>> +---------------+----------------+-------------------+ >>>> | 4 | 6.16 | 5.71 | >>>> +---------------+----------------+-------------------+ >>>> | 16 | 7.38 | 5.91 | >>>> +---------------+----------------+-------------------+ >>>> | 64 | 12.17 | 6.16 | >>>> +---------------+----------------+-------------------+ >>>> | 128 | 18.20 | 6.50 | >>>> +---------------+----------------+-------------------+ >>>> | 192 | 24.56 | 6.80 | >>>> +---------------+----------------+-------------------+ >>>> >>>> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx> >>>> --- >>>> arch/x86/kvm/svm/sev.c | 304 ++++++++++++++++++++++++++--------------- >>>> arch/x86/kvm/svm/svm.c | 1 + >>>> arch/x86/kvm/svm/svm.h | 6 +- >>>> 3 files changed, 200 insertions(+), 111 deletions(-) >>>> <SNIP> >>>> static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, >>>> unsigned long ulen, unsigned long *n, >>>> int write) >>>> { >>>> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; >>>> + struct pinned_region *region; >>>> unsigned long npages, size; >>>> int npinned; >>>> - unsigned long locked, lock_limit; >>>> struct page **pages; >>>> - unsigned long first, last; >>>> int ret; >>>> >>>> lockdep_assert_held(&kvm->lock); >>>> @@ -395,15 +413,12 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, >>>> if (ulen == 0 || uaddr + ulen < uaddr) >>>> return ERR_PTR(-EINVAL); >>>> >>>> - /* Calculate number of pages. */ >>>> - first = (uaddr & PAGE_MASK) >> PAGE_SHIFT; >>>> - last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT; >>>> - npages = (last - first + 1); >>>> + npages = get_npages(uaddr, ulen); >>>> >>>> - locked = sev->pages_locked + npages; >>>> - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; >>>> - if (locked > lock_limit && !capable(CAP_IPC_LOCK)) { >>>> - pr_err("SEV: %lu locked pages exceed the lock limit of %lu.\n", locked, lock_limit); >>>> + if (rlimit_memlock_exceeds(sev->pages_to_lock, npages)) { >>>> + pr_err("SEV: %lu locked pages exceed the lock limit of %lu.\n", >>>> + sev->pages_to_lock + npages, >>>> + (rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT)); >>>> return ERR_PTR(-ENOMEM); >>>> } >>>> >>>> @@ -429,7 +444,19 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, >>>> } >>>> >>>> *n = npages; >>>> - sev->pages_locked = locked; >>>> + sev->pages_to_lock += npages; >>>> + >>>> + /* Maintain region list that is pinned to be unpinned in vm destroy path */ >>>> + region = kzalloc(sizeof(*region), GFP_KERNEL_ACCOUNT); >>>> + if (!region) { >>>> + ret = -ENOMEM; >>>> + goto err; >>>> + } >>>> + region->uaddr = uaddr; >>>> + region->size = ulen; >>>> + region->pages = pages; >>>> + region->npages = npages; >>>> + list_add_tail(®ion->list, &sev->pinned_regions_list); >>> >>> Hmm. I see a duplication of the metadata. We already store the pfns in >>> memslot. But now we also do it in regions. Is this one used for >>> migration purpose? >> >> We are not duplicating, the enc_region holds regions that are pinned other >> than svm_register_enc_region(). Later patches add infrastructure to directly >> fault-in those pages which will use memslot->pfns. >> >>> >>> I might miss some of the context here. >> >> More context here: >> https://lore.kernel.org/kvm/CAMkAt6p1-82LTRNB3pkPRwYh=wGpreUN=jcUeBj_dZt8ss9w0Q@xxxxxxxxxxxxxx/ > > hmm. I think I might got the point. However, logically, I still think we > might not need double data structures for pinning. When vcpu is not > online, we could use the the array in memslot to contain the pinned > pages, right? Yes. > Since user-level code is not allowed to pin arbitrary regions of HVA, we > could check that and bail out early if the region goes out of a memslot. > > From that point, the only requirement is that we need a valid memslot > before doing memory encryption and pinning. So enc_region is still not > needed from this point. > > This should save some time to avoid double pinning and make the pinning > information clear. Agreed, I think that should be possible: * Check for addr/end being part of a memslot * Error out in case it is not part of any memslot * Add __sev_pin_pfn() which is not dependent on vcpu arg. * Iterate over the pages and use __sev_pin_pfn() routine to pin. slots = kvm_memslots(kvm); kvm_for_each_memslot_in_hva_range(node, slots, addr, end) { slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]); slot_start = slot->userspace_addr; slot_end = slot_start + (slot->npages << PAGE_SHIFT); hva_start = max(addr, slot_start); hva_end = min(end, slot_end) for (uaddr = hva_start; uaddr < hva_end; uaddr += PAGE_SIZE) { __sev_pin_pfn(slot, uaddr, PG_LEVEL_4K) } } This will make sure memslot based data structure is used and enc_region can be removed. Regards Nikunj