On 2012-06-28 11:31, Peter Lieven wrote: > On 28.06.2012 11:21, Jan Kiszka wrote: >> On 2012-06-28 11:11, Peter Lieven wrote: >>> On 27.06.2012 18:54, Jan Kiszka wrote: >>>> On 2012-06-27 17:39, Peter Lieven wrote: >>>>> Hi all, >>>>> >>>>> i debugged this further and found out that kvm-kmod-3.0 is working with >>>>> qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is >>>>> working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0). >>>>> Has anyone a clue which new KVM feature could cause this if a vcpu is in >>>>> an infinite loop? >>>> Before accusing kvm-kmod ;), can you check if the effect is visible with >>>> an original Linux 3.3.x or 3.4.x kernel as well? >>> sorry, i should have been more specific. maybe I also misunderstood sth. >>> I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel >>> 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if >>> I use >>> a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't. >>> however, maybe we don't have to dig to deep - see below. >> kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4 >> working on an older kernel. This step may introduce bugs of its own. >> Therefore my suggestion to use a "real" 3.x kernel to exclude that risk >> first of all. >> >>>> Then, bisection the change in qemu-kvm that apparently resolved the >>>> issue would be interesting. >>>> >>>> If we have to dig deeper, tracing [1] the lockup would likely be helpful >>>> (all events of the qemu process, not just KVM related ones: trace-cmd >>>> record -e all qemu-system-x86_64 ...). >>> that here is bascially whats going on: >>> >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read >>> len 3 gpa 0xa0000 val 0x10ff >>> qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva >>> 0xa0000 gpa 0xa0000 Read GPA >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>> unsatisfied-read len 1 gpa 0xa0000 val 0x0 >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason >>> KVM_EXIT_MMIO (6) >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>> read len 3 gpa 0xa0000 val 0x10ff >>> qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva >>> 0xa0000 gpa 0xa0000 Read GPA >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>> unsatisfied-read len 1 gpa 0xa0000 val 0x0 >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason >>> KVM_EXIT_MMIO (6) >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>> read len 3 gpa 0xa0000 val 0x10ff >>> qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva >>> 0xa0000 gpa 0xa0000 Read GPA >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>> unsatisfied-read len 1 gpa 0xa0000 val 0x0 >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason >>> KVM_EXIT_MMIO (6) >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>> read len 3 gpa 0xa0000 val 0x10ff >>> qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva >>> 0xa0000 gpa 0xa0000 Read GPA >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio >>> unsatisfied-read len 1 gpa 0xa0000 val 0x0 >>> qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason >>> KVM_EXIT_MMIO (6) >>> >>> its doing that forever. this is tracing the kvm module. doing the >>> qemu-system-x86_64 trace is a bit compilcated, but >>> maybe this is already sufficient. otherwise i will of course gather this >>> info as well. >> That's only tracing KVM event, and it's tracing when things went wrong >> already. We may need a full trace (-e all) specifically for the period >> when this pattern above started. > i will do that. maybe i should explain that the vcpu is executing > garbage when this above starts. its basically booting from an empty > harddisk. > > if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env); > > maybe the time to handle the monitor/qmp connection is just to short. > if i understand furhter correctly, it can only handle monitor connections > while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i > wrong here? the time spend in this state might be rather short. Unless you played with priorities and affinities, the Linux scheduler should provide the required time to the iothread. > > my concern is not that the machine hangs, just the the hypervisor is > unresponsive > and its impossible to reset or quit gracefully. the only way to get the > hypervisor > ended is via SIGKILL. Right. Even if the guest runs wild, you must be able to control the vm via the monitor etc. If not, that's a bug. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html