On 27 November 2015 at 09:08, Tyler Baker <tyler.baker@xxxxxxxxxx> wrote: > On 27 November 2015 at 00:54, Christian Borntraeger > <borntraeger@xxxxxxxxxx> wrote: >> On 11/26/2015 09:47 PM, Christian Borntraeger wrote: >>> On 11/26/2015 05:17 PM, Tyler Baker wrote: >>>> Hi Christian, >>>> >>>> The kernelci.org bot recently has been reporting kvm guest boot >>>> failures[1] on various arm64 platforms in next-20151126. The bot >>>> bisected[2] the failures to the commit in -next titled "KVM: Create >>>> debugfs dir and stat files for each VM". I confirmed by reverting this >>>> commit on top of next-20151126 it resolves the boot issue. >>>> >>>> In this test case the host and guest are booted with the same kernel. >>>> The host is booted over nfs, installs qemu (qemu-system arm64 2.4.0), >>>> and launches a guest. The host is booting fine, but when the guest is >>>> launched it errors with "Failed to retrieve host CPU features!". I >>>> checked the host logs, and found an "Unable to handle kernel paging >>>> request" splat[3] which occurs when the guest is attempting to start. >>>> >>>> I scanned the patch in question but nothing obvious jumped out at me, >>>> any thoughts? >>> >>> Not really. >>> Do you have processing running that do read the files in /sys/kernel/debug/kvm/* ? >>> >>> If I read the arm oops message correctly it oopsed inside >>> __srcu_read_lock. there is actually nothing in there that can oops, >>> except the access to the preempt count. I am just guessing right now, >>> but maybe the preempt variable is no longer available (as the process >>> is gone). As long as a debugfs file is open, we hold a reference to >>> the kvm, which holds a reference to the mm, so the mm might be killed >>> after the process. But this is supposed to work, so maybe its something >>> different. An objdump of __srcu_read_lock might help. >> >> Hmm, the preempt thing is done in srcu_read_lock, but the crash is in >> __srcu_read_lock. This function gets the srcu struct from mmu_notifier.c, >> which must be present and is initialized during boot. >> >> >> int __srcu_read_lock(struct srcu_struct *sp) >> { >> int idx; >> >> idx = READ_ONCE(sp->completed) & 0x1; >> __this_cpu_inc(sp->per_cpu_ref->c[idx]); >> smp_mb(); /* B */ /* Avoid leaking the critical section. */ >> __this_cpu_inc(sp->per_cpu_ref->seq[idx]); >> return idx; >> } >> >> Looking at the code I have no clue why the patch does make a difference. >> Can you try to get an objdump -S for__Srcu_read_lock? Some other interesting finding below... On the host, I do _not_ have any nodes under /sys/kernel/debug/kvm/ Running strace on the qemu command I use to launch the guest yields the following. [pid 5963] 1448649724.405537 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6652a000 [pid 5963] 1448649724.405586 read(13, "MemTotal: 16414616 kB\nMemF"..., 1024) = 1024 [pid 5963] 1448649724.405699 close(13) = 0 [pid 5963] 1448649724.405755 munmap(0x7f6652a000, 65536) = 0 [pid 5963] 1448649724.405947 brk(0x2552f000) = 0x2552f000 [pid 5963] 1448649724.406148 openat(AT_FDCWD, "/dev/kvm", O_RDWR|O_CLOEXEC) = 13 [pid 5963] 1448649724.406209 ioctl(13, KVM_CREATE_VM, 0) = -1 ENOMEM (Cannot allocate memory) [pid 5963] 1448649724.406382 close(13) = 0 [pid 5963] 1448649724.406435 write(2, "Failed to retrieve host CPU feat"..., 38Failed to retrieve host CPU features! ) = 38 Tyler -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html