On 11/27/2015 09:42 PM, Tyler Baker wrote:
On 27 November 2015 at 10:53, Tyler Baker <tyler.baker@xxxxxxxxxx> wrote:
On 27 November 2015 at 09:08, Tyler Baker <tyler.baker@xxxxxxxxxx> wrote:
On 27 November 2015 at 00:54, Christian Borntraeger
<borntraeger@xxxxxxxxxx> wrote:
On 11/26/2015 09:47 PM, Christian Borntraeger wrote:
On 11/26/2015 05:17 PM, Tyler Baker wrote:
Hi Christian,
The kernelci.org bot recently has been reporting kvm guest boot
failures[1] on various arm64 platforms in next-20151126. The bot
bisected[2] the failures to the commit in -next titled "KVM: Create
debugfs dir and stat files for each VM". I confirmed by reverting this
commit on top of next-20151126 it resolves the boot issue.
In this test case the host and guest are booted with the same kernel.
The host is booted over nfs, installs qemu (qemu-system arm64 2.4.0),
and launches a guest. The host is booting fine, but when the guest is
launched it errors with "Failed to retrieve host CPU features!". I
checked the host logs, and found an "Unable to handle kernel paging
request" splat[3] which occurs when the guest is attempting to start.
I scanned the patch in question but nothing obvious jumped out at me,
any thoughts?
Not really.
Do you have processing running that do read the files in /sys/kernel/debug/kvm/* ?
If I read the arm oops message correctly it oopsed inside
__srcu_read_lock. there is actually nothing in there that can oops,
except the access to the preempt count. I am just guessing right now,
but maybe the preempt variable is no longer available (as the process
is gone). As long as a debugfs file is open, we hold a reference to
the kvm, which holds a reference to the mm, so the mm might be killed
after the process. But this is supposed to work, so maybe its something
different. An objdump of __srcu_read_lock might help.
Hmm, the preempt thing is done in srcu_read_lock, but the crash is in
__srcu_read_lock. This function gets the srcu struct from mmu_notifier.c,
which must be present and is initialized during boot.
int __srcu_read_lock(struct srcu_struct *sp)
{
int idx;
idx = READ_ONCE(sp->completed) & 0x1;
__this_cpu_inc(sp->per_cpu_ref->c[idx]);
smp_mb(); /* B */ /* Avoid leaking the critical section. */
__this_cpu_inc(sp->per_cpu_ref->seq[idx]);
return idx;
}
Looking at the code I have no clue why the patch does make a difference.
Can you try to get an objdump -S for__Srcu_read_lock?
Some other interesting finding below...
On the host, I do _not_ have any nodes under /sys/kernel/debug/kvm/
Running strace on the qemu command I use to launch the guest yields
the following.
[pid 5963] 1448649724.405537 mmap(NULL, 65536, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6652a000
[pid 5963] 1448649724.405586 read(13, "MemTotal: 16414616
kB\nMemF"..., 1024) = 1024
[pid 5963] 1448649724.405699 close(13) = 0
[pid 5963] 1448649724.405755 munmap(0x7f6652a000, 65536) = 0
[pid 5963] 1448649724.405947 brk(0x2552f000) = 0x2552f000
[pid 5963] 1448649724.406148 openat(AT_FDCWD, "/dev/kvm",
O_RDWR|O_CLOEXEC) = 13
[pid 5963] 1448649724.406209 ioctl(13, KVM_CREATE_VM, 0) = -1 ENOMEM
(Cannot allocate memory)
If I comment the call to kvm_create_vm_debugfs(kvm) the guest boots
fine. I put some printk's in the kvm_create_vm_debugfs() function and
it's returning -ENOMEM after it evaluates !kvm->debugfs_dentry. I was
chatting with some folks from the Linaro virtualization team and they
mentioned that ARM is a bit special as the same PID creates two vms in
quick succession, the first one is a scratch vm, and the other is the
'real' vm. With that bit of info, I suspect we may be trying to create
the debugfs directory twice, and the second time it's failing because
it already exists.
Cheers,
Tyler
After a quick look into qemu I guess I've found the problem:
kvm_init creates a vm, does checking and self initialization and
then calls kvm_arch_init. The arch initialization indirectly
calls kvm_arm_create_scratch_host_vcpu and that's where the
trouble begins, as it also creates a VM.
My assumption was, that nobody would create multiple VMs under
the same PID. Christian and I are working on a solution on kernel
side.
Cheers
Janosch
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html