On 1/20/2022 12:22 AM, Maciej S. Szmigiero wrote: > On 19.01.2022 07:33, Nikunj A. Dadhania wrote: >> On 1/18/2022 8:30 PM, Maciej S. Szmigiero wrote: >>> On 18.01.2022 12:06, Nikunj A Dadhania wrote: >>>> +static struct page **sev_pin_memory_in_mmu(struct kvm *kvm, unsigned long addr, >>>> + unsigned long size, >>>> + unsigned long *npages) >>>> +{ >>>> + struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; >>>> + struct kvm_vcpu *vcpu; >>>> + struct page **pages; >>>> + unsigned long i; >>>> + u32 error_code; >>>> + kvm_pfn_t pfn; >>>> + int idx, ret = 0; >>>> + gpa_t gpa; >>>> + bool ro; >>>> + >>>> + pages = sev_alloc_pages(sev, addr, size, npages); >>>> + if (IS_ERR(pages)) >>>> + return pages; >>>> + >>>> + vcpu = kvm_get_vcpu(kvm, 0); >>>> + if (mutex_lock_killable(&vcpu->mutex)) { >>>> + kvfree(pages); >>>> + return ERR_PTR(-EINTR); >>>> + } >>>> + >>>> + vcpu_load(vcpu); >>>> + idx = srcu_read_lock(&kvm->srcu); >>>> + >>>> + kvm_mmu_load(vcpu); >>>> + >>>> + for (i = 0; i < *npages; i++, addr += PAGE_SIZE) { >>>> + if (signal_pending(current)) { >>>> + ret = -ERESTARTSYS; >>>> + break; >>>> + } >>>> + >>>> + if (need_resched()) >>>> + cond_resched(); >>>> + >>>> + gpa = hva_to_gpa(kvm, addr, &ro); >>>> + if (gpa == UNMAPPED_GVA) { >>>> + ret = -EFAULT; >>>> + break; >>>> + } >>> >>> This function is going to have worst case O(n²) complexity if called with >>> the whole VM memory (or O(n * log(n)) when hva_to_memslot() is modified >>> to use kvm_for_each_memslot_in_hva_range()). >> >> I understand your concern and will address it. BTW, this is called for a small >> fragment of VM memory( <10MB), that needs to be pinned before the guest execution >> starts. > > I understand it is a relatively small memory area now, but a rewrite of > this patch that makes use of kvm_for_each_memslot_in_hva_range() while > taking care of other considerations (like overlapping hva) will also > solve the performance issue.> >>> That's really bad for something that can be done in O(n) time - look how >>> kvm_for_each_memslot_in_gfn_range() does it over gfns. >>> >> >> I saw one use of kvm_for_each_memslot_in_gfn_range() in __kvm_zap_rmaps(), and >> that too calls slot_handle_level_range() which has a for_each_slot_rmap_range(). >> How would that be O(n) ? >> >> kvm_for_each_memslot_in_gfn_range() { >> ... >> slot_handle_level_range() >> ... >> } >> >> slot_handle_level_range() { >> ... >> for_each_slot_rmap_range() { >> ... >> } >> ... >> } > > kvm_for_each_memslot_in_gfn_range() iterates over gfns, which are unique, > so at most one memslot is returned per gfn (and if a memslot covers > multiple gfns in the requested range it will be returned just once). > > for_each_slot_rmap_range() then iterates over rmaps covering that > *single* memslot: look at slot_rmap_walk_next() - the memslot under > iteration is not advanced. > > So each memslot returned by kvm_for_each_memslot_in_gfn_range() is > iterated over just once by the aforementioned macro. > >>> Besides performance considerations I can't see the code here taking into >>> account the fact that a hva can map to multiple memslots (they an overlap >>> in the host address space). >> >> You are right I was returning at the first match, looks like if I switch to using kvm_for_each_memslot_in_hva_range() it should take care of overlapping hva, is this understanding correct ? > > Let's say that the requested range of hva for sev_pin_memory_in_mmu() to > handle is 0x1000 - 0x2000. > > If there are three memslots: > 1: hva 0x1000 - 0x2000 -> gpa 0x1000 - 0x2000 > 2: hva 0x1000 - 0x2000 -> gpa 0x2000 - 0x3000 > 3: hva 0x2000 - 0x3000 -> gpa 0x3000 - 0x4000 > > then kvm_for_each_memslot_in_hva_range() will return the first two, > essentially covering the hva range of 0x1000 - 0x2000 twice. > > If such hva aliases are permitted the code has to be ready for this case > and handle it sensibly: > If you need to return just a single struct page per a hva AND / OR pin > operations aren't idempotent then it has to keep track which hva were > already processed. > > Another, and probably the easiest option would be to simply disallow > such overlapping memslots in the requested range and make > KVM_SEV_LAUNCH_UPDATE_DATA ioctl return something like EINVAL in this > case - if that would be acceptable semantics for this ioctl. > > In any case, the main loop in sev_pin_memory_in_mmu() will probably > need to be build around a kvm_for_each_memslot_in_hva_range() call, > which will then solve the performance issue, too. Sure, I already tried out and have the walk implemented using kvm_for_each_memslot_in_hva_range() call. Regards Nikunj