Re: 2nd level lockups using VMX nesting on 3.11 based host kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03.09.2013 20:13, Gleb Natapov wrote:
> On Tue, Sep 03, 2013 at 03:19:27PM +0200, Stefan Bader wrote:
>> With current 3.11 kernels we got reports of nested qemu failing in weird ways. I
>> believe 3.10 also had issues before. Not sure whether those were the same.
>> With 3.8 based kernels (close to current stable) I found no such issues.
> Try to bisect it.

It took a while to bisect. Though I am not sure this helps much. Starting from
v3.9, the first broken commit is:

commit 5f3d5799974b89100268ba813cec8db7bd0693fb
KVM: nVMX: Rework event injection and recovery

This sounds reasonable as this changes event injection between nested levels.
However starting with this patch I am unable to start any second level guest.
Very soon after the second level guest starts, the first (and by that the second
level as well) lock up completely without any visible messages.

This goes on until

commit 5a2892ce72e010e3cb96b438d7cdddce0c88e0e6
KVM: nVMX: Skip PF interception check when queuing during nested run

In between there was also a period where first level did not lock up but would
either seem not to schedule the second level guest or displayed internal error
messages from starting the second level.

Given that it sounds like the current double faults in second level might be one
of the issues introduced by the injection rework that remains until now while
other issues were fixed from the second commit on.

I am not really deeply familiar with the nVMX code, just trying to make sense of
observations. The double fault always seems to originate from the cmos_interrupt
function in the second level guest. It is not immediate and sometimes took
several repeated runs to trigger (during bisect I would require 10 successful
test runs before marking it good). So could it maybe be some event / interrupt
(cmos related?) that accidentally gets injected into the wrong guest level? Or
maybe the same event taking place at the same time for more than one level and
messing up things?

-Stefan

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux