Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1

Jan Kiszka <jan.kiszka@xxxxxxxxxxx> · Thu, 28 Jun 2012 11:39:08 +0200

On 2012-06-28 11:31, Peter Lieven wrote:
> On 28.06.2012 11:21, Jan Kiszka wrote:
>> On 2012-06-28 11:11, Peter Lieven wrote:
>>> On 27.06.2012 18:54, Jan Kiszka wrote:
>>>> On 2012-06-27 17:39, Peter Lieven wrote:
>>>>> Hi all,
>>>>>
>>>>> i debugged this further and found out that kvm-kmod-3.0 is working with
>>>>> qemu-kvm-1.0.1 while kvm-kmod-3.3 and kvm-kmod-3.4 are not. What is
>>>>> working as well is kvm-kmod-3.4 with an old userspace (qemu-kvm-0.13.0).
>>>>> Has anyone a clue which new KVM feature could cause this if a vcpu is in
>>>>> an infinite loop?
>>>> Before accusing kvm-kmod ;), can you check if the effect is visible with
>>>> an original Linux 3.3.x or 3.4.x kernel as well?
>>> sorry, i should have been more specific. maybe I also misunderstood sth.
>>> I was believing that kvm-kmod-3.0 is basically what is in vanialla kernel
>>> 3.0. If I use the ubuntu kernel from ubuntu oneiric (3.0.0) it works, if
>>> I use
>>> a self-compiled kvm-kmod-3.3/3.4 with that kernel it doesn't.
>>> however, maybe we don't have to dig to deep - see below.
>> kvm-kmod wraps and patches things to make the kvm code from 3.3/3.4
>> working on an older kernel. This step may introduce bugs of its own.
>> Therefore my suggestion to use a "real" 3.x kernel to exclude that risk
>> first of all.
>>
>>>> Then, bisection the change in qemu-kvm that apparently resolved the
>>>> issue would be interesting.
>>>>
>>>> If we have to dig deeper, tracing [1] the lockup would likely be helpful
>>>> (all events of the qemu process, not just KVM related ones: trace-cmd
>>>> record -e all qemu-system-x86_64 ...).
>>> that here is bascially whats going on:
>>>
>>>    qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio read
>>> len 3 gpa 0xa0000 val 0x10ff
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:      gva
>>> 0xa0000 gpa 0xa0000 Read GPA
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
>>> unsatisfied-read len 1 gpa 0xa0000 val 0x0
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
>>> KVM_EXIT_MMIO (6)
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
>>> read len 3 gpa 0xa0000 val 0x10ff
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:      gva
>>> 0xa0000 gpa 0xa0000 Read GPA
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
>>> unsatisfied-read len 1 gpa 0xa0000 val 0x0
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
>>> KVM_EXIT_MMIO (6)
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
>>> read len 3 gpa 0xa0000 val 0x10ff
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:      gva
>>> 0xa0000 gpa 0xa0000 Read GPA
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
>>> unsatisfied-read len 1 gpa 0xa0000 val 0x0
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
>>> KVM_EXIT_MMIO (6)
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
>>> read len 3 gpa 0xa0000 val 0x10ff
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: vcpu_match_mmio:      gva
>>> 0xa0000 gpa 0xa0000 Read GPA
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_mmio:             mmio
>>> unsatisfied-read len 1 gpa 0xa0000 val 0x0
>>>      qemu-kvm-1.0-2506  [010] 60996.908000: kvm_userspace_exit:   reason
>>> KVM_EXIT_MMIO (6)
>>>
>>> its doing that forever. this is tracing the kvm module. doing the
>>> qemu-system-x86_64 trace is a bit compilcated, but
>>> maybe this is already sufficient. otherwise i will of course gather this
>>> info as well.
>> That's only tracing KVM event, and it's tracing when things went wrong
>> already. We may need a full trace (-e all) specifically for the period
>> when this pattern above started.
> i will do that. maybe i should explain that the vcpu is executing
> garbage when this above starts. its basically booting from an empty 
> harddisk.
> 
> if i understand correctly qemu-kvm loops in kvm_cpu_exec(CPUState *env);
> 
> maybe the time to handle the monitor/qmp connection is just to short.
> if i understand furhter correctly, it can only handle monitor connections
> while qemu-kvm is executing kvm_vcpu_ioctl(env, KVM_RUN, 0); or am i
> wrong here? the time spend in this state might be rather short.

Unless you played with priorities and affinities, the Linux scheduler
should provide the required time to the iothread.

> 
> my concern is not that the machine hangs, just the the hypervisor is 
> unresponsive
> and its impossible to reset or quit gracefully. the only way to get the 
> hypervisor
> ended is via SIGKILL.

Right. Even if the guest runs wild, you must be able to control the vm
via the monitor etc. If not, that's a bug.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html