On Mon, Apr 08, 2013 at 10:11:52AM -0700, Andrew Honig wrote: > On Mon, Apr 8, 2013 at 2:24 AM, Gleb Natapov <gleb@xxxxxxxxxx> wrote: > > On Thu, Apr 04, 2013 at 12:39:47PM -0700, Andrew Honig wrote: > >> If userspace creates and destroys multiple VMs within the same process > >> we leak 20k of memory in the userspace process context per VM. This > >> patch frees the memory in kvm_arch_destroy_vm. If the process exits > >> without closing the VM file descriptor or the file descriptor has been > >> shared with another process then we don't need to free the memory. > >> > >> Messing with user space memory from an fd is not ideal, but other changes > >> would require user space changes and this is consistent with how the > >> memory is currently allocated. > >> > >> Tested: Test ran several VMs and ran against test program meant to > >> demonstrate the leak (www.spinics.net/lists/kvm/msg83734.html). > >> > >> Signed-off-by: Andrew Honig <ahonig@xxxxxxxxxx> > >> > >> --- > >> arch/x86/include/asm/kvm_host.h | 3 +++ > >> arch/x86/kvm/vmx.c | 3 +++ > >> arch/x86/kvm/x86.c | 11 +++++++++++ > >> 3 files changed, 17 insertions(+) > >> > >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > >> index 4979778..975a74d 100644 > >> --- a/arch/x86/include/asm/kvm_host.h > >> +++ b/arch/x86/include/asm/kvm_host.h > >> @@ -553,6 +553,9 @@ struct kvm_arch { > >> struct page *ept_identity_pagetable; > >> bool ept_identity_pagetable_done; > >> gpa_t ept_identity_map_addr; > >> + unsigned long ept_ptr; > >> + unsigned long apic_ptr; > >> + unsigned long tss_ptr; > >> > > Better to use __kvm_set_memory_region() with memory_size = 0 to delete > > the slot and fix kvm_arch_prepare_memory_region() to unmap if > > change == KVM_MR_DELETE. > > > Will do in the next version. > > >> unsigned long irq_sources_bitmap; > >> s64 kvmclock_offset; > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > >> index 6667042..8aa5d81 100644 > >> --- a/arch/x86/kvm/vmx.c > >> +++ b/arch/x86/kvm/vmx.c > >> @@ -3703,6 +3703,7 @@ static int alloc_apic_access_page(struct kvm *kvm) > >> } > >> > >> kvm->arch.apic_access_page = page; > >> + kvm->arch.apic_ptr = kvm_userspace_mem.userspace_addr; > >> out: > >> mutex_unlock(&kvm->slots_lock); > >> return r; > >> @@ -3733,6 +3734,7 @@ static int alloc_identity_pagetable(struct kvm *kvm) > >> } > >> > >> kvm->arch.ept_identity_pagetable = page; > >> + kvm->arch.ept_ptr = kvm_userspace_mem.userspace_addr; > >> out: > >> mutex_unlock(&kvm->slots_lock); > >> return r; > >> @@ -4366,6 +4368,7 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr) > >> if (ret) > >> return ret; > >> kvm->arch.tss_addr = addr; > >> + kvm->arch.tss_ptr = tss_mem.userspace_addr; > >> if (!init_rmode_tss(kvm)) > >> return -ENOMEM; > >> > >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > >> index f19ac0a..411ff2a 100644 > >> --- a/arch/x86/kvm/x86.c > >> +++ b/arch/x86/kvm/x86.c > >> @@ -6812,6 +6812,16 @@ void kvm_arch_sync_events(struct kvm *kvm) > >> > >> void kvm_arch_destroy_vm(struct kvm *kvm) > >> { > >> + if (current->mm == kvm->mm) { > >> + /* > >> + * Free pages allocated on behalf of userspace, unless the > >> + * the memory map has changed due to process exit or fd > >> + * copying. > >> + */ > > Why mm changes during process exit? And what do you mean by fd copying? > > One process creates kvm fd and pass it to another? In this case I think > > the leak will still be there since all of the addresses bellow are > > mapped after kvm fd is created. apic_access_page and identity_pagetable > > during first vcpu creation and tss when KVM_SET_TSS_ADDR ioctl is > > called. Vcpu creation and ioctl call can be done by different process > > from the one that created kvm fd. > > > > > > The mm changes during process exit because exit_mm is called to clean > up process memory before exit_files is called. If the process exits > without closing the fds then the vm will be closed after the mm is > destroyed and set to null. Without checking for that, we'd access the > invalid mm and panic. > OK, thanks for explanation. > By fd copying I mean passing the fd to another process over unix > domain sockets or by using fork(). My understanding was that vcpu > creation and ioctl call could not be done by a different process than > the one that created by kvm fd. Both kvm_vm_ioctl and kvm_vcpu_ioctl > start with a check to prevent this: > if (kvm->mm != current->mm) > return -EIO; > As there's already lots of requirements in the code that the mm being > used by a VCPU is the same as the mm used by the VM. > Right you are. > I agree that a process could still fork or otherwise pass the fd > between processes to induce the 5 pages to leak, but this couldn't > happen unless the processes were badly misusing the API in a way that > wouldn't work anyway. This is also of no use to a malicious user > space application either, because the 5 pages are in the user space > context. > > These issues not withstanding this is a huge improvement over the > current situation where there's no way a user space application can > allocate and deallocate VMs without leaking memory each time. > > I considered other approaches, but I they all involved more > coordination from user space and changes to the ABI. Some thoughts > were > 1) Expose the addresses of the private memslots so user space has the > opportunity to free them. > 2) Allow user space to provide the memory used for the private memslots. > 3) Add an ioctl which cleans up the private memslots to be called > before destroying the VM. > > I'm happy to try any of those or listen to other ideas. Would you > prefer to see an approach like that, or should I send a second version > of this current patch? Please send a second version. 2 sounds reasonable, but since it involves changing API it will be a long time before we can deprecate current approach, so if we can make current code work reasonably well we may as well stay with it. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html