Re: [RFC PATCH 6/6] KVM: SVM: Pin SEV pages in MMU during sev_launch_update_data()

"Nikunj A. Dadhania" <nikunj@xxxxxxx> · Thu, 20 Jan 2022 09:54:52 +0530

On 1/20/2022 12:22 AM, Maciej S. Szmigiero wrote:
> On 19.01.2022 07:33, Nikunj A. Dadhania wrote:
>> On 1/18/2022 8:30 PM, Maciej S. Szmigiero wrote:
>>> On 18.01.2022 12:06, Nikunj A Dadhania wrote:

>>>> +static struct page **sev_pin_memory_in_mmu(struct kvm *kvm, unsigned long addr,
>>>> +                       unsigned long size,
>>>> +                       unsigned long *npages)
>>>> +{
>>>> +    struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
>>>> +    struct kvm_vcpu *vcpu;
>>>> +    struct page **pages;
>>>> +    unsigned long i;
>>>> +    u32 error_code;
>>>> +    kvm_pfn_t pfn;
>>>> +    int idx, ret = 0;
>>>> +    gpa_t gpa;
>>>> +    bool ro;
>>>> +
>>>> +    pages = sev_alloc_pages(sev, addr, size, npages);
>>>> +    if (IS_ERR(pages))
>>>> +        return pages;
>>>> +
>>>> +    vcpu = kvm_get_vcpu(kvm, 0);
>>>> +    if (mutex_lock_killable(&vcpu->mutex)) {
>>>> +        kvfree(pages);
>>>> +        return ERR_PTR(-EINTR);
>>>> +    }
>>>> +
>>>> +    vcpu_load(vcpu);
>>>> +    idx = srcu_read_lock(&kvm->srcu);
>>>> +
>>>> +    kvm_mmu_load(vcpu);
>>>> +
>>>> +    for (i = 0; i < *npages; i++, addr += PAGE_SIZE) {
>>>> +        if (signal_pending(current)) {
>>>> +            ret = -ERESTARTSYS;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        if (need_resched())
>>>> +            cond_resched();
>>>> +
>>>> +        gpa = hva_to_gpa(kvm, addr, &ro);
>>>> +        if (gpa == UNMAPPED_GVA) {
>>>> +            ret = -EFAULT;
>>>> +            break;
>>>> +        }
>>>
>>> This function is going to have worst case O(n²) complexity if called with
>>> the whole VM memory (or O(n * log(n)) when hva_to_memslot() is modified
>>> to use kvm_for_each_memslot_in_hva_range()).
>>
>> I understand your concern and will address it. BTW, this is called for a small
>> fragment of VM memory( <10MB), that needs to be pinned before the guest execution
>> starts.
> 
> I understand it is a relatively small memory area now, but a rewrite of
> this patch that makes use of kvm_for_each_memslot_in_hva_range() while
> taking care of other considerations (like overlapping hva) will also
> solve the performance issue.> 
>>> That's really bad for something that can be done in O(n) time - look how
>>> kvm_for_each_memslot_in_gfn_range() does it over gfns.
>>>
>>
>> I saw one use of kvm_for_each_memslot_in_gfn_range() in __kvm_zap_rmaps(), and
>> that too calls slot_handle_level_range() which has a for_each_slot_rmap_range().
>> How would that be O(n) ?
>>
>> kvm_for_each_memslot_in_gfn_range() {
>>     ...
>>     slot_handle_level_range()
>>     ...
>> }
>>
>> slot_handle_level_range() {
>>     ...
>>     for_each_slot_rmap_range() {
>>         ...
>>     }
>>     ...
>> }
> 
> kvm_for_each_memslot_in_gfn_range() iterates over gfns, which are unique,
> so at most one memslot is returned per gfn (and if a memslot covers
> multiple gfns in the requested range it will be returned just once).
> 
> for_each_slot_rmap_range() then iterates over rmaps covering that
> *single* memslot: look at slot_rmap_walk_next() - the memslot under
> iteration is not advanced.
> 
> So each memslot returned by kvm_for_each_memslot_in_gfn_range() is
> iterated over just once by the aforementioned macro.
> 
>>> Besides performance considerations I can't see the code here taking into
>>> account the fact that a hva can map to multiple memslots (they an overlap
>>> in the host address space).
>>
>> You are right I was returning at the first match, looks like if I switch to using kvm_for_each_memslot_in_hva_range() it should take care of overlapping hva, is this understanding correct ?
> 
> Let's say that the requested range of hva for sev_pin_memory_in_mmu() to
> handle is 0x1000 - 0x2000.
> 
> If there are three memslots:
> 1: hva 0x1000 - 0x2000 -> gpa 0x1000 - 0x2000
> 2: hva 0x1000 - 0x2000 -> gpa 0x2000 - 0x3000
> 3: hva 0x2000 - 0x3000 -> gpa 0x3000 - 0x4000
> 
> then kvm_for_each_memslot_in_hva_range() will return the first two,
> essentially covering the hva range of 0x1000 - 0x2000 twice.
> 
> If such hva aliases are permitted the code has to be ready for this case
> and handle it sensibly:
> If you need to return just a single struct page per a hva AND / OR pin
> operations aren't idempotent then it has to keep track which hva were
> already processed.
> 
> Another, and probably the easiest option would be to simply disallow
> such overlapping memslots in the requested range and make
> KVM_SEV_LAUNCH_UPDATE_DATA ioctl return something like EINVAL in this
> case - if that would be acceptable semantics for this ioctl.
> 
> In any case, the main loop in sev_pin_memory_in_mmu() will probably
> need to be build around a kvm_for_each_memslot_in_hva_range() call,
> which will then solve the performance issue, too.

Sure, I already tried out and have the walk implemented using 
kvm_for_each_memslot_in_hva_range() call.

Regards
Nikunj