Re: [PATCH 2/2] KVM: Create debugfs dir and stat files for each VM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/27/2015 09:42 PM, Tyler Baker wrote:
> On 27 November 2015 at 10:53, Tyler Baker <tyler.baker@xxxxxxxxxx> wrote:
>> On 27 November 2015 at 09:08, Tyler Baker <tyler.baker@xxxxxxxxxx> wrote:
>>> On 27 November 2015 at 00:54, Christian Borntraeger
>>> <borntraeger@xxxxxxxxxx> wrote:
>>>> On 11/26/2015 09:47 PM, Christian Borntraeger wrote:
>>>>> On 11/26/2015 05:17 PM, Tyler Baker wrote:
>>>>>> Hi Christian,
>>>>>>
>>>>>> The kernelci.org bot recently has been reporting kvm guest boot
>>>>>> failures[1] on various arm64 platforms in next-20151126. The bot
>>>>>> bisected[2] the failures to the commit in -next titled "KVM: Create
>>>>>> debugfs dir and stat files for each VM". I confirmed by reverting this
>>>>>> commit on top of next-20151126 it resolves the boot issue.
>>>>>>
>>>>>> In this test case the host and guest are booted with the same kernel.
>>>>>> The host is booted over nfs, installs qemu (qemu-system arm64 2.4.0),
>>>>>> and launches a guest. The host is booting fine, but when the guest is
>>>>>> launched it errors with "Failed to retrieve host CPU features!". I
>>>>>> checked the host logs, and found an "Unable to handle kernel paging
>>>>>> request" splat[3] which occurs when the guest is attempting to start.
>>>>>>
>>>>>> I scanned the patch in question but nothing obvious jumped out at me,
>>>>>> any thoughts?
>>>>>
>>>>> Not really.
>>>>> Do you have processing running that do read the files in /sys/kernel/debug/kvm/* ?
>>>>>
>>>>> If I read the arm oops message correctly it oopsed inside
>>>>> __srcu_read_lock. there is actually nothing in there that can oops,
>>>>> except the access to the preempt count. I am just guessing right now,
>>>>> but maybe the preempt variable is no longer available (as the process
>>>>> is gone). As long as a debugfs file is open, we hold a reference to
>>>>> the kvm, which holds a reference to the mm, so the mm might be killed
>>>>> after the process. But this is supposed to work, so maybe its something
>>>>> different. An objdump of __srcu_read_lock might help.
>>>>
>>>> Hmm, the preempt thing is done in srcu_read_lock, but the crash is in
>>>> __srcu_read_lock. This function gets the srcu struct from mmu_notifier.c,
>>>> which must be present and is initialized during boot.
>>>>
>>>>
>>>> int __srcu_read_lock(struct srcu_struct *sp)
>>>> {
>>>>         int idx;
>>>>
>>>>         idx = READ_ONCE(sp->completed) & 0x1;
>>>>         __this_cpu_inc(sp->per_cpu_ref->c[idx]);
>>>>         smp_mb(); /* B */  /* Avoid leaking the critical section. */
>>>>         __this_cpu_inc(sp->per_cpu_ref->seq[idx]);
>>>>         return idx;
>>>> }
>>>>
>>>> Looking at the code I have no clue why the patch does make a difference.
>>>> Can you try to get an objdump -S for__Srcu_read_lock?
>>
>> Some other interesting finding below...
>>
>> On the host, I do _not_ have any nodes under /sys/kernel/debug/kvm/
>>
>> Running strace on the qemu command I use to launch the guest yields
>> the following.
>>
>> [pid  5963] 1448649724.405537 mmap(NULL, 65536, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6652a000
>> [pid  5963] 1448649724.405586 read(13, "MemTotal:       16414616
>> kB\nMemF"..., 1024) = 1024
>> [pid  5963] 1448649724.405699 close(13) = 0
>> [pid  5963] 1448649724.405755 munmap(0x7f6652a000, 65536) = 0
>> [pid  5963] 1448649724.405947 brk(0x2552f000) = 0x2552f000
>> [pid  5963] 1448649724.406148 openat(AT_FDCWD, "/dev/kvm",
>> O_RDWR|O_CLOEXEC) = 13
>> [pid  5963] 1448649724.406209 ioctl(13, KVM_CREATE_VM, 0) = -1 ENOMEM
>> (Cannot allocate memory)
> 
> If I comment the call to kvm_create_vm_debugfs(kvm) the guest boots
> fine. I put some printk's in the kvm_create_vm_debugfs() function and
> it's returning -ENOMEM after it evaluates !kvm->debugfs_dentry. I was
> chatting with some folks from the Linaro virtualization team and they
> mentioned that ARM is a bit special as the same PID creates two vms in
> quick succession, the first one is a scratch vm, and the other is the
> 'real' vm. With that bit of info, I suspect we may be trying to create
> the debugfs directory twice, and the second time it's failing because
> it already exists.

Hmmm, with a patched QEMU that calls VM_CREATE twice it errors out on s390
with -ENOMEM (which it should not), but it errors out gracefully.

Does the attached patch avoid the crash? (guest will not start, but qemu
should error out gracefully with ENOMEM)



diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f7d6c8f..b26472a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -671,12 +671,16 @@ static struct kvm *kvm_create_vm(unsigned long type)
 
 	r = kvm_create_vm_debugfs(kvm);
 	if (r)
-		goto out_err;
+		goto out_mmu;
 
 	preempt_notifier_inc();
 
 	return kvm;
 
+out_mmu:
+#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
+	mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm);
+#endif
 out_err:
 	cleanup_srcu_struct(&kvm->irq_srcu);
 out_err_no_irq_srcu:

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux