On Wed, Mar 09, 2022, Nikunj A. Dadhania wrote: > On 3/9/2022 3:23 AM, Mingwei Zhang wrote: > > On Tue, Mar 08, 2022, Nikunj A Dadhania wrote: > >> Use the memslot metadata to store the pinned data along with the pfns. > >> This improves the SEV guest startup time from O(n) to a constant by > >> deferring guest page pinning until the pages are used to satisfy > >> nested page faults. The page reference will be dropped in the memslot > >> free path or deallocation path. > >> > >> Reuse enc_region structure definition as pinned_region to maintain > >> pages that are pinned outside of MMU demand pinning. Remove rest of > >> the code which did upfront pinning, as they are no longer needed in > >> view of the demand pinning support. > > > > I don't quite understand why we still need the enc_region. I have > > several concerns. Details below. > > With patch 9 the enc_region is used only for memory that was pinned before > the vcpu is online (i.e. mmu is not yet usable) > > >> > >> Retain svm_register_enc_region() and svm_unregister_enc_region() with > >> required checks for resource limit. > >> > >> Guest boot time comparison > >> +---------------+----------------+-------------------+ > >> | Guest Memory | baseline | Demand Pinning | > >> | Size (GB) | (secs) | (secs) | > >> +---------------+----------------+-------------------+ > >> | 4 | 6.16 | 5.71 | > >> +---------------+----------------+-------------------+ > >> | 16 | 7.38 | 5.91 | > >> +---------------+----------------+-------------------+ > >> | 64 | 12.17 | 6.16 | > >> +---------------+----------------+-------------------+ > >> | 128 | 18.20 | 6.50 | > >> +---------------+----------------+-------------------+ > >> | 192 | 24.56 | 6.80 | > >> +---------------+----------------+-------------------+ > >> > >> Signed-off-by: Nikunj A Dadhania <nikunj@xxxxxxx> > >> --- > >> arch/x86/kvm/svm/sev.c | 304 ++++++++++++++++++++++++++--------------- > >> arch/x86/kvm/svm/svm.c | 1 + > >> arch/x86/kvm/svm/svm.h | 6 +- > >> 3 files changed, 200 insertions(+), 111 deletions(-) > >> > >> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > >> index bd7572517c99..d0514975555d 100644 > >> --- a/arch/x86/kvm/svm/sev.c > >> +++ b/arch/x86/kvm/svm/sev.c > >> @@ -66,7 +66,7 @@ static unsigned int nr_asids; > >> static unsigned long *sev_asid_bitmap; > >> static unsigned long *sev_reclaim_asid_bitmap; > >> > >> -struct enc_region { > >> +struct pinned_region { > >> struct list_head list; > >> unsigned long npages; > >> struct page **pages; > >> @@ -257,7 +257,7 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp) > >> if (ret) > >> goto e_free; > >> > >> - INIT_LIST_HEAD(&sev->regions_list); > >> + INIT_LIST_HEAD(&sev->pinned_regions_list); > >> > >> return 0; > >> > >> @@ -378,16 +378,34 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) > >> return ret; > >> } > >> > >> +static bool rlimit_memlock_exceeds(unsigned long locked, unsigned long npages) > >> +{ > >> + unsigned long lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > >> + unsigned long lock_req; > >> + > >> + lock_req = locked + npages; > >> + return (lock_req > lock_limit) && !capable(CAP_IPC_LOCK); > >> +} > >> + > >> +static unsigned long get_npages(unsigned long uaddr, unsigned long ulen) > >> +{ > >> + unsigned long first, last; > >> + > >> + /* Calculate number of pages. */ > >> + first = (uaddr & PAGE_MASK) >> PAGE_SHIFT; > >> + last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT; > >> + return last - first + 1; > >> +} > >> + > >> static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, > >> unsigned long ulen, unsigned long *n, > >> int write) > >> { > >> struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; > >> + struct pinned_region *region; > >> unsigned long npages, size; > >> int npinned; > >> - unsigned long locked, lock_limit; > >> struct page **pages; > >> - unsigned long first, last; > >> int ret; > >> > >> lockdep_assert_held(&kvm->lock); > >> @@ -395,15 +413,12 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, > >> if (ulen == 0 || uaddr + ulen < uaddr) > >> return ERR_PTR(-EINVAL); > >> > >> - /* Calculate number of pages. */ > >> - first = (uaddr & PAGE_MASK) >> PAGE_SHIFT; > >> - last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT; > >> - npages = (last - first + 1); > >> + npages = get_npages(uaddr, ulen); > >> > >> - locked = sev->pages_locked + npages; > >> - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > >> - if (locked > lock_limit && !capable(CAP_IPC_LOCK)) { > >> - pr_err("SEV: %lu locked pages exceed the lock limit of %lu.\n", locked, lock_limit); > >> + if (rlimit_memlock_exceeds(sev->pages_to_lock, npages)) { > >> + pr_err("SEV: %lu locked pages exceed the lock limit of %lu.\n", > >> + sev->pages_to_lock + npages, > >> + (rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT)); > >> return ERR_PTR(-ENOMEM); > >> } > >> > >> @@ -429,7 +444,19 @@ static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, > >> } > >> > >> *n = npages; > >> - sev->pages_locked = locked; > >> + sev->pages_to_lock += npages; > >> + > >> + /* Maintain region list that is pinned to be unpinned in vm destroy path */ > >> + region = kzalloc(sizeof(*region), GFP_KERNEL_ACCOUNT); > >> + if (!region) { > >> + ret = -ENOMEM; > >> + goto err; > >> + } > >> + region->uaddr = uaddr; > >> + region->size = ulen; > >> + region->pages = pages; > >> + region->npages = npages; > >> + list_add_tail(®ion->list, &sev->pinned_regions_list); > > > > Hmm. I see a duplication of the metadata. We already store the pfns in > > memslot. But now we also do it in regions. Is this one used for > > migration purpose? > > We are not duplicating, the enc_region holds regions that are pinned other > than svm_register_enc_region(). Later patches add infrastructure to directly > fault-in those pages which will use memslot->pfns. > > > > > I might miss some of the context here. > > More context here: > https://lore.kernel.org/kvm/CAMkAt6p1-82LTRNB3pkPRwYh=wGpreUN=jcUeBj_dZt8ss9w0Q@xxxxxxxxxxxxxx/ hmm. I think I might got the point. However, logically, I still think we might not need double data structures for pinning. When vcpu is not online, we could use the the array in memslot to contain the pinned pages, right? Since user-level code is not allowed to pin arbitrary regions of HVA, we could check that and bail out early if the region goes out of a memslot.