Re: KVM guest crashes

Alexander Graf <agraf@xxxxxxx> · Sat, 24 Jan 2009 08:42:06 +0100

Hi Marcelo,

On 23.01.2009, at 23:36, Marcelo Tosatti wrote:

Hi Alexander,

On Thu, Jan 22, 2009 at 09:29:46PM +0100, Alexander Graf wrote:

Following the discussion on IRC, I tried -no-kvm-irqchip and found  
some
virtual machines broken after >1 day of stress testing again:

+ sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel
-initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm
cifsuser=contain2 cifspass=contain2 root=cifs://contain2:contain2@xxxxx
6.2.1/contain2 realroot=//172.16.2.1/users/contain2
ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0
dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 - 
net
tap,ifname=tap2,sc
ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null
qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000
Stuck ??
Stuck ??
BUG: unable to handle kernel NULL pointer dereference at  
0000000000000000
IP: [<ffffffff802b539a>] kfree+0x18b/0x26e
PGD 0
Oops: 0000 [1] SMP
last sysfs file:
CPU 2
Modules linked in:
Supported: Yes
Pid: 0, comm: swapper Tainted: G S        2.6.27.7-9-default #1
RIP: 0010:[<ffffffff802b539a>]  [<ffffffff802b539a>] kfree+0x18b/ 
0x26e
RSP: 0018:ffff88007a493e90  EFLAGS: 00010046
RAX: 0000000000000002 RBX: ffff8800010397f0 RCX: ffff88007a480778
RDX: ffffe20000000000 RSI: ffff8800010397f0 RDI: ffff88007a5ae140
RBP: 0000000000000000 R08: ffff8800010395d0 R09: ffff88007a493eb8
R10: ffffffff80a59980 R11: ffffffff8021c5d9 R12: 0000000000000001
R13: ffff88007ac04080 R14: 0000000010200042 R15: ffff88007a5ae140
FS:  0000000000000000(0000) GS:ffff88007a461f40(0000) knlGS: 
0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88007a48a000, task  
ffff88007a488280)
Stack:  ffffffff8023df9c ffffffff8073a108 0000000000000286  
ffffffff8024a1eb
ffffffff80259d80 ffff8800010397f0 0000000000000000 0000000000000001
000000000000000a 0000000010200042 0000000000000010 ffffffff802831d0
Call Trace:
[<ffffffff802831d0>] __rcu_process_callbacks+0x189/0x203
[<ffffffff80283271>] rcu_process_callbacks+0x27/0x47
[<ffffffff802464ed>] __do_softirq+0x84/0x115
[<ffffffff8020dc9c>] call_softirq+0x1c/0x28
[<ffffffff8020f067>] do_softirq+0x3c/0x81
[<ffffffff80246204>] irq_exit+0x3f/0x83
[<ffffffff8021ce5f>] smp_apic_timer_interrupt+0x95/0xae
[<ffffffff8020d4a3>] apic_timer_interrupt+0x83/0x90
[<ffffffff80221f1d>] native_safe_halt+0x2/0x3
[<ffffffff80213465>] default_idle+0x38/0x54
[<ffffffff8020b34a>] cpu_idle+0xa9/0xf1

Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44  
dd 08
48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 <8b>  
55
00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00
RIP  [<ffffffff802b539a>] kfree+0x18b/0x26e
RSP <ffff88007a493e90>
CR2: 0000000000000000
---[ end trace 4eaa2a86a8e2da22 ]---

Also after two days of permanent stress testing I also got the Intel
machine w/ current git down:

+ sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 - 
localtime
-kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
clocksource=acpi_pm cifsuser=contain1 cifspass=contain1
root=cifs://contain1:contain1@xxxxxxxxxx/contain1
realroot=//172.16.1.1/users/contain1
ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0
dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 - 
net
tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null
qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000
Stuck ??

No backtrace here though. That's all I got from the serial console.

The only issues I had with the UP guests so far was this:

+ taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel
virtio-kernel -initrd virtio-initrd -nographic -append 'quiet
clocksource=acpi_pm cifsuser=contain6 cifspass=contain6
root=cifs://contain6:contain6@xxxxxxxxxx/contain6
realroot=//172.16.6.1/users/contain6
ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0
dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 - 
net
tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null
qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000
..MP-BIOS bug: 8254 timer not connected to IO-APIC
Kernel panic - not syncing: IO-APIC + timer doesn't work!  Boot with
apic=debug and send a report.  Then try booting with the 'noapic'  
option.

which can be annoying at times too. Can't we just detect that it's  
the
detection and give the guest its interrupts? Or should the PIT
reinjection thing help here?

There are a number of problems that can result in this error, and the
problems are possibly different between the in-kernel PIT and  
userspace
PIT emulation (note it also happens with in-kernel PIT, just much more
rarely now). You can use the no_timer_check kernel option to bypass  
it.

Ok :-). Thanks. The logic in the kernel for this is really stupid  
(basing timing on clock speed). What about disabling the check if we  
detect KVM?

Regarding the corruption problem, I have a few questions:

- It is SMP specific (ie both kernel/userspace irqchip fail).
	- which means UP guests are stable with both kernel/user
	  irqchip.

I have not been able to reproduce any of my issues with UP. I have to  
admit that I only tried UP with in-kernel irqchip.

The "Stuck ??" messages seem to be coming from smpboot.c. So for some
reason vcpu's are being reset. Don't seem to be a triple fault because
in that case all vcpu's would be reset (so yes, the vcpu was really on
BIOS code).

Hm. I know that OSX turns off CPUs it doesn't need as an alternative  
to deep-sleep. Does Linux do that too?

Suggest the following:
- Confirm the problem happens with root on ext3 filesystem (can't you
 mount the CIFS and copy the data over to a local guest disk to
 simulate similar load?).

I had Stuck ?? messages without networking, but if it helps I can try  
that too. In the project we're using this for we do things over cifs,  
so that's why I built the test case around it.

- Check that the kernel text is not corrupted. Save the "good" kernel
 text with QEMU's "pmemsave" or "memsave" (you can see start/end in
 the symbols _text/_etext, /proc/kallsyms) after booting. After you
 see the crash, save the "bad" kernel text, compare. This can give
 additional clues (or not).

Good idea - I'll try.

Also, you mentioned "other reports" previously, can you point to them,
please?

Yes, will do later. I gotta run now! Thanks for the reply - it's good  
to know this isn't getting ignored :-).

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html