On Mon, Mar 16, 2015 at 04:10:40PM +0100, Saso Slavicic wrote: > Hi, > > I'm fairly experienced with KVM (Centos 5/6), running about a dozen servers > with 20-30 different (Linux & MS platform) systems. > I have one Windows XP machine that acts very strangely - it freezes. I get > ping timeout for the VM from my monitoring and the machine spins 2 or 3 > cores using all the cpu. Now the interesting thing that happens is that once > you open the console, it suddenly starts working again. You can see the > clock catching up as it was frozen in time and everything works normally > once the timer catches up. It usually happens probably about once a month, > although it happened yesterday and today again. > > This machine is on Centos 6, qemu-kvm-0.12.1.2-2.448.el6_6, kernel > 2.6.32-504.3.3.el6.x86_64. > I was able to do some debugging when the machine was frozen, so I got some > things to work with: > > # virsh qemu-monitor-command --hmp DBserver 'info cpus' > * CPU #0: pc=0x0000000080501fdd thread_id=32595 > CPU #1: pc=0x00000000806e7a9b thread_id=32596 > CPU #2: pc=0x00000000ba2da162 (halted) thread_id=32597 > CPU #3: pc=0x00000000ba2da162 (halted) thread_id=32598 > > Now, in both yesterday's and today's event the CPU0 was stopped at > 0x0000000080501fdd. I've disassembled the function and got this: > > 0x0000000080501fb5: int3 > 0x0000000080501fb6: mov %edi,%edi > 0x0000000080501fb8: push %ebp > 0x0000000080501fb9: mov %esp,%ebp > 0x0000000080501fbb: push %esi > 0x0000000080501fbc: mov %fs:0x20,%eax > 0x0000000080501fc2: mov 0x8(%ebp),%ecx > 0x0000000080501fc5: lea -0x1(%ecx),%esi > 0x0000000080501fc8: test %esi,%ecx > 0x0000000080501fca: lea 0x7ec(%eax),%edx > 0x0000000080501fd0: pop %esi > 0x0000000080501fd1: je 0x80501fdd > 0x0000000080501fd3: lea 0x7a0(%eax),%edx > 0x0000000080501fd9: jmp 0x80501fdd > *0x0000000080501fdb: pause > 0x0000000080501fdd: cmpl $0x0,(%edx) > 0x0000000080501fe0: jne 0x80501fdb > 0x0000000080501fe2: pop %ebp > 0x0000000080501fe3: ret $0x4 > 0x0000000080501fe6: int3 > > Mov %edi,%edi is clearly the start of some function. From what I've been > able to understand, the code fetches _KPRCB structure (%fs:0x20) and then > does a spinlock between fdb and fe0 checking for PacketBarrier (?) in EDX > (0xffdff8c0). Now, $pc always shows fdd address, shouldn't it jump between > fdb and fe0, it seems as if it was stuck at fdd? > > # virsh qemu-monitor-command --hmp DBserver 'info registers' > EAX=ffdff120 EBX=c06ddf58 ECX=0000000e EDX=ffdff8c0 > ESI=be6e3921 EDI=c06ddf60 EBP=ba4ff708 ESP=ba4ff708 > EIP=80501fdd EFL=00000202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA] > CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] > SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] > DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS [-WA] > FS =0030 ffdff000 00001fff 00c09300 DPL=0 DS [-WA] > GS =0000 00000000 000fffff 00000000 > LDT=0000 00000000 000fffff 00000000 > TR =0028 80042000 000020ab 00008b00 DPL=0 TSS32-busy > GDT= 8003f000 000003ff > IDT= 8003f400 000007ff > CR0=8001003b CR2=dbbec000 CR3=0b3c0020 CR4=000006f8 > DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 > DR6=ffff0ff0 DR7=00000400 > FCW=027f FSW=0020 [ST=0] FTW=00 MXCSR=00001fa0 > FPR0=8053632b003c1658 c048 FPR1=e1e0c048bf80f6ab 76f8 > FPR2=e1e0000000000000 0023 FPR3=0b017c30003c1658 0000 > FPR4=0000003bba1a7604 1e64 FPR5=0007268c00000000 003b > FPR6=000002020000001b 2684 FPR7=e3e0a9b4e1b50de4 ca0b > XMM00=0000000000a1fc95000000000020027f > XMM01=0000ffff00001fa000001c4c00000001 > XMM02=000000000000c0488053632b003c1658 > XMM03=00000000000076f8e1e0c048bf80f6ab > XMM04=0000000000000023e1e0000000000000 > XMM05=00000000000000000b017c30003c1658 > XMM06=0000000000001e640000003bba1a7604 > XMM07=000000000000003b0007268c00000000 > > Clearly, the address in EDX is not 0: > > [root@linux ~]# virsh qemu-monitor-command --hmp DBserver 'x/1xb 0xFFDFF8C0' > 00000000ffdff8c0: 0x0e > > [root@linux ~]# virt-manager > > [root@linux ~]# virsh qemu-monitor-command --hmp DBserver 'x/1xb 0xFFDFF8C0' > 00000000ffdff8c0: 0x00 > > However as soon as the VM console is opened and machine starts, the address > in EDX is set to 0 and the loop is broken. > Does anybody recognize what function that is? What could possibly happen > that opening the console and moving the mouse a little, unfreezes the > machine? > VM has .81 virtio drivers from Fedora repo at the moment. Generate a Windows dump? https://support.microsoft.com/en-us/kb/254649 https://support.microsoft.com/en-us/kb/972110 Step 7: Generate a complete crash dump file or a kernel crash dump file by using an NMI on a Windows-based system (you can inject NMIs via QEMU monitor). > > The configuration of the machine is pretty standard: > > <!-- > WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE > OVERWRITTEN AND LOST. Changes to this xml configuration should be made > using: > virsh edit DBserver > or other application using the libvirt API. > --> > > <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'> > <name>DBserver</name> > <uuid>e42b4cf2-7264-515f-4d24-6267eaa24be8</uuid> > <memory unit='KiB'>3145728</memory> > <currentMemory unit='KiB'>3145728</currentMemory> > <vcpu placement='static'>4</vcpu> > <os> > <type arch='x86_64' machine='rhel6.6.0'>hvm</type> > <boot dev='hd'/> > </os> > <features> > <acpi/> > <apic/> > <pae/> > </features> > <cpu> > <topology sockets='1' cores='4' threads='4'/> > </cpu> > <clock offset='localtime'> > <timer name='rtc' tickpolicy='catchup'/> > </clock> > <on_poweroff>destroy</on_poweroff> > <on_reboot>restart</on_reboot> > <on_crash>restart</on_crash> > <devices> > <emulator>/usr/libexec/qemu-kvm</emulator> > <disk type='block' device='disk'> > <driver name='qemu' type='raw' cache='none' io='native'/> > <source dev='/dev/drbd1'/> > <target dev='vda' bus='virtio'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x03' > function='0x0'/> > </disk> > <disk type='block' device='disk'> > <driver name='qemu' type='raw' cache='none' io='native'/> > <source > dev='/dev/disk/by-id/usb-WD_Ext_HDD_1021_574D415A4138353838383731-0:0'/> > <target dev='vdb' bus='virtio'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x04' > function='0x0'/> > </disk> > <disk type='file' device='cdrom'> > <driver name='qemu' type='raw'/> > <target dev='hdc' bus='ide'/> > <readonly/> > <address type='drive' controller='0' bus='1' target='0' unit='0'/> > </disk> > <controller type='usb' index='0' model='ich9-ehci1'> > <address type='pci' domain='0x0000' bus='0x00' slot='0x05' > function='0x7'/> > </controller> > <controller type='usb' index='0' model='ich9-uhci1'> > <master startport='0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x05' > function='0x0' multifunction='on'/> > </controller> > <controller type='usb' index='0' model='ich9-uhci2'> > <master startport='2'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x05' > function='0x1'/> > </controller> > <controller type='usb' index='0' model='ich9-uhci3'> > <master startport='4'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x05' > function='0x2'/> > </controller> > <controller type='ide' index='0'> > <address type='pci' domain='0x0000' bus='0x00' slot='0x01' > function='0x1'/> > </controller> > <interface type='bridge'> > <mac address='52:54:00:a6:92:ca'/> > <source bridge='br0'/> > <model type='virtio'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x06' > function='0x0'/> > </interface> > <serial type='pty'> > <target port='0'/> > </serial> > <console type='pty'> > <target type='serial' port='0'/> > </console> > <input type='mouse' bus='ps2'/> > <graphics type='vnc' port='-1' autoport='yes'/> > <video> > <model type='vga' vram='9216' heads='1'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x02' > function='0x0'/> > </video> > <memballoon model='virtio'> > <address type='pci' domain='0x0000' bus='0x00' slot='0x07' > function='0x0'/> > </memballoon> > </devices> > <qemu:commandline> > <qemu:arg value='-set'/> > <qemu:arg value='device.virtio-disk0.x-data-plane=on'/> > </qemu:commandline> > </domain> > > The above config is already changed as I've first experimented with removing > usb tablet (and installing vmware mouse drivers), turning 'x-data-plane on' > and so on, hoping to solve the problem...Is there anything else I can check > the next time the machine freezes? > > Regards, > Saso Slavicic > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html