Re: [PATCH] KVM: x86: Fix memory leak in vmx.c

Gleb Natapov <gleb@xxxxxxxxxx> · Mon, 8 Apr 2013 20:53:36 +0300



On Mon, Apr 08, 2013 at 10:11:52AM -0700, Andrew Honig wrote:
> On Mon, Apr 8, 2013 at 2:24 AM, Gleb Natapov <gleb@xxxxxxxxxx> wrote:
> > On Thu, Apr 04, 2013 at 12:39:47PM -0700, Andrew Honig wrote:
> >> If userspace creates and destroys multiple VMs within the same process
> >> we leak 20k of memory in the userspace process context per VM.  This
> >> patch frees the memory in kvm_arch_destroy_vm.  If the process exits
> >> without closing the VM file descriptor or the file descriptor has been
> >> shared with another process then we don't need to free the memory.
> >>
> >> Messing with user space memory from an fd is not ideal, but other changes
> >> would require user space changes and this is consistent with how the
> >> memory is currently allocated.
> >>
> >> Tested: Test ran several VMs and ran against test program meant to
> >> demonstrate the leak (www.spinics.net/lists/kvm/msg83734.html).
> >>
> >> Signed-off-by: Andrew Honig <ahonig@xxxxxxxxxx>
> >>
> >> ---
> >>  arch/x86/include/asm/kvm_host.h |    3 +++
> >>  arch/x86/kvm/vmx.c              |    3 +++
> >>  arch/x86/kvm/x86.c              |   11 +++++++++++
> >>  3 files changed, 17 insertions(+)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index 4979778..975a74d 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -553,6 +553,9 @@ struct kvm_arch {
> >>       struct page *ept_identity_pagetable;
> >>       bool ept_identity_pagetable_done;
> >>       gpa_t ept_identity_map_addr;
> >> +     unsigned long ept_ptr;
> >> +     unsigned long apic_ptr;
> >> +     unsigned long tss_ptr;
> >>
> > Better to use __kvm_set_memory_region() with memory_size = 0 to delete
> > the slot and fix kvm_arch_prepare_memory_region() to unmap if
> > change == KVM_MR_DELETE.
> >
> Will do in the next version.
> 
> >>       unsigned long irq_sources_bitmap;
> >>       s64 kvmclock_offset;
> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >> index 6667042..8aa5d81 100644
> >> --- a/arch/x86/kvm/vmx.c
> >> +++ b/arch/x86/kvm/vmx.c
> >> @@ -3703,6 +3703,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
> >>       }
> >>
> >>       kvm->arch.apic_access_page = page;
> >> +     kvm->arch.apic_ptr = kvm_userspace_mem.userspace_addr;
> >>  out:
> >>       mutex_unlock(&kvm->slots_lock);
> >>       return r;
> >> @@ -3733,6 +3734,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
> >>       }
> >>
> >>       kvm->arch.ept_identity_pagetable = page;
> >> +     kvm->arch.ept_ptr = kvm_userspace_mem.userspace_addr;
> >>  out:
> >>       mutex_unlock(&kvm->slots_lock);
> >>       return r;
> >> @@ -4366,6 +4368,7 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
> >>       if (ret)
> >>               return ret;
> >>       kvm->arch.tss_addr = addr;
> >> +     kvm->arch.tss_ptr = tss_mem.userspace_addr;
> >>       if (!init_rmode_tss(kvm))
> >>               return  -ENOMEM;
> >>
> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index f19ac0a..411ff2a 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -6812,6 +6812,16 @@ void kvm_arch_sync_events(struct kvm *kvm)
> >>
> >>  void kvm_arch_destroy_vm(struct kvm *kvm)
> >>  {
> >> +     if (current->mm == kvm->mm) {
> >> +             /*
> >> +              * Free pages allocated on behalf of userspace, unless the
> >> +              * the memory map has changed due to process exit or fd
> >> +              * copying.
> >> +              */
> > Why mm changes during process exit? And what do you mean by fd copying?
> > One process creates kvm fd and pass it to another? In this case I think
> > the leak will still be there since all of the addresses bellow are
> > mapped after kvm fd is created. apic_access_page and identity_pagetable
> > during first vcpu creation and tss when KVM_SET_TSS_ADDR ioctl is
> > called. Vcpu creation and ioctl call can be done by different process
> > from the one that created kvm fd.
> >
> >
> 
> The mm changes during process exit because exit_mm is called to clean
> up process memory before exit_files is called.  If the process exits
> without closing the fds then the vm will be closed after the mm is
> destroyed and set to null.  Without checking for that, we'd access the
> invalid mm and panic.
> 
OK, thanks for explanation.

> By fd copying I mean passing the fd to another process over unix
> domain sockets or by using fork().  My understanding was that vcpu
> creation and ioctl call could not be done by a different process than
> the one that created by kvm fd.  Both kvm_vm_ioctl and kvm_vcpu_ioctl
> start with a check to prevent this:
> if (kvm->mm != current->mm)
>               return -EIO;
> As there's already lots of requirements in the code that the mm being
> used by a VCPU is the same as the mm used by the VM.
> 
Right you are.

> I agree that a process could still fork or otherwise pass the fd
> between processes to induce the 5 pages to leak, but this couldn't
> happen unless the processes were badly misusing the API in a way that
> wouldn't work anyway.  This is also of no use to a malicious user
> space application either, because the 5 pages are in the user space
> context.
> 
> These issues not withstanding this is a huge improvement over the
> current situation where there's no way a user space application can
> allocate and deallocate VMs without leaking memory each time.
> 
> I considered other approaches, but I they all involved more
> coordination from user space and changes to the ABI.  Some thoughts
> were
> 1) Expose the addresses of the private memslots so user space has the
> opportunity to free them.
> 2) Allow user space to provide the memory used for the private memslots.
> 3) Add an ioctl which cleans up the private memslots to be called
> before destroying the VM.
> 
> I'm happy to try any of those or listen to other ideas.  Would you
> prefer to see an approach like that, or should I send a second version
> of this current patch?
Please send a second version. 2 sounds reasonable, but since it involves
changing API it will be a long time before we can deprecate current
approach, so if we can make current code work reasonably well we may
as well stay with it.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html