Re: [PATCH] KVM: x86: Fix memory leak in vmx.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 8, 2013 at 2:24 AM, Gleb Natapov <gleb@xxxxxxxxxx> wrote:
> On Thu, Apr 04, 2013 at 12:39:47PM -0700, Andrew Honig wrote:
>> If userspace creates and destroys multiple VMs within the same process
>> we leak 20k of memory in the userspace process context per VM.  This
>> patch frees the memory in kvm_arch_destroy_vm.  If the process exits
>> without closing the VM file descriptor or the file descriptor has been
>> shared with another process then we don't need to free the memory.
>>
>> Messing with user space memory from an fd is not ideal, but other changes
>> would require user space changes and this is consistent with how the
>> memory is currently allocated.
>>
>> Tested: Test ran several VMs and ran against test program meant to
>> demonstrate the leak (www.spinics.net/lists/kvm/msg83734.html).
>>
>> Signed-off-by: Andrew Honig <ahonig@xxxxxxxxxx>
>>
>> ---
>>  arch/x86/include/asm/kvm_host.h |    3 +++
>>  arch/x86/kvm/vmx.c              |    3 +++
>>  arch/x86/kvm/x86.c              |   11 +++++++++++
>>  3 files changed, 17 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 4979778..975a74d 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -553,6 +553,9 @@ struct kvm_arch {
>>       struct page *ept_identity_pagetable;
>>       bool ept_identity_pagetable_done;
>>       gpa_t ept_identity_map_addr;
>> +     unsigned long ept_ptr;
>> +     unsigned long apic_ptr;
>> +     unsigned long tss_ptr;
>>
> Better to use __kvm_set_memory_region() with memory_size = 0 to delete
> the slot and fix kvm_arch_prepare_memory_region() to unmap if
> change == KVM_MR_DELETE.
>
Will do in the next version.

>>       unsigned long irq_sources_bitmap;
>>       s64 kvmclock_offset;
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 6667042..8aa5d81 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -3703,6 +3703,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
>>       }
>>
>>       kvm->arch.apic_access_page = page;
>> +     kvm->arch.apic_ptr = kvm_userspace_mem.userspace_addr;
>>  out:
>>       mutex_unlock(&kvm->slots_lock);
>>       return r;
>> @@ -3733,6 +3734,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
>>       }
>>
>>       kvm->arch.ept_identity_pagetable = page;
>> +     kvm->arch.ept_ptr = kvm_userspace_mem.userspace_addr;
>>  out:
>>       mutex_unlock(&kvm->slots_lock);
>>       return r;
>> @@ -4366,6 +4368,7 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
>>       if (ret)
>>               return ret;
>>       kvm->arch.tss_addr = addr;
>> +     kvm->arch.tss_ptr = tss_mem.userspace_addr;
>>       if (!init_rmode_tss(kvm))
>>               return  -ENOMEM;
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index f19ac0a..411ff2a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -6812,6 +6812,16 @@ void kvm_arch_sync_events(struct kvm *kvm)
>>
>>  void kvm_arch_destroy_vm(struct kvm *kvm)
>>  {
>> +     if (current->mm == kvm->mm) {
>> +             /*
>> +              * Free pages allocated on behalf of userspace, unless the
>> +              * the memory map has changed due to process exit or fd
>> +              * copying.
>> +              */
> Why mm changes during process exit? And what do you mean by fd copying?
> One process creates kvm fd and pass it to another? In this case I think
> the leak will still be there since all of the addresses bellow are
> mapped after kvm fd is created. apic_access_page and identity_pagetable
> during first vcpu creation and tss when KVM_SET_TSS_ADDR ioctl is
> called. Vcpu creation and ioctl call can be done by different process
> from the one that created kvm fd.
>
>

The mm changes during process exit because exit_mm is called to clean
up process memory before exit_files is called.  If the process exits
without closing the fds then the vm will be closed after the mm is
destroyed and set to null.  Without checking for that, we'd access the
invalid mm and panic.

By fd copying I mean passing the fd to another process over unix
domain sockets or by using fork().  My understanding was that vcpu
creation and ioctl call could not be done by a different process than
the one that created by kvm fd.  Both kvm_vm_ioctl and kvm_vcpu_ioctl
start with a check to prevent this:
if (kvm->mm != current->mm)
              return -EIO;
As there's already lots of requirements in the code that the mm being
used by a VCPU is the same as the mm used by the VM.

I agree that a process could still fork or otherwise pass the fd
between processes to induce the 5 pages to leak, but this couldn't
happen unless the processes were badly misusing the API in a way that
wouldn't work anyway.  This is also of no use to a malicious user
space application either, because the 5 pages are in the user space
context.

These issues not withstanding this is a huge improvement over the
current situation where there's no way a user space application can
allocate and deallocate VMs without leaking memory each time.

I considered other approaches, but I they all involved more
coordination from user space and changes to the ABI.  Some thoughts
were
1) Expose the addresses of the private memslots so user space has the
opportunity to free them.
2) Allow user space to provide the memory used for the private memslots.
3) Add an ioctl which cleans up the private memslots to be called
before destroying the VM.

I'm happy to try any of those or listen to other ideas.  Would you
prefer to see an approach like that, or should I send a second version
of this current patch?

>> +             vm_munmap(kvm->arch.apic_ptr, PAGE_SIZE);
>> +             vm_munmap(kvm->arch.ept_ptr, PAGE_SIZE);
>> +             vm_munmap(kvm->arch.tss_ptr, PAGE_SIZE * 3);
>> +     }
>>       kvm_iommu_unmap_guest(kvm);
>>       kfree(kvm->arch.vpic);
>>       kfree(kvm->arch.vioapic);
>> @@ -6929,6 +6939,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>>                       return PTR_ERR((void *)userspace_addr);
>>
>>               memslot->userspace_addr = userspace_addr;
>> +             mem->userspace_addr = userspace_addr;
>>       }
>>
>>       return 0;
>> --
>> 1.7.10.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
>                         Gleb.

thanks,
Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux