Re: 2nd level lockups using VMX nesting on 3.11 based host kernel

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Wed, 11 Sep 2013 18:32:41 +0200

Il 10/09/2013 09:52, Stefan Bader ha scritto:
> On 03.09.2013 20:13, Gleb Natapov wrote:
>> On Tue, Sep 03, 2013 at 03:19:27PM +0200, Stefan Bader wrote:
>>> With current 3.11 kernels we got reports of nested qemu failing
>>> in weird ways. I believe 3.10 also had issues before. Not sure
>>> whether those were the same. With 3.8 based kernels (close to
>>> current stable) I found no such issues.
>> Try to bisect it.
> 
> It took a while to bisect. Though I am not sure this helps much.
> Starting from v3.9, the first broken commit is:
> 
> commit 5f3d5799974b89100268ba813cec8db7bd0693fb KVM: nVMX: Rework
> event injection and recovery
> 
> This sounds reasonable as this changes event injection between
> nested levels. However starting with this patch I am unable to
> start any second level guest. Very soon after the second level
> guest starts, the first (and by that the second level as well) lock
> up completely without any visible messages.
> 
> This goes on until
> 
> commit 5a2892ce72e010e3cb96b438d7cdddce0c88e0e6 KVM: nVMX: Skip PF
> interception check when queuing during nested run

I'm not sure I'm seeing the same issue as you, but it is similar
enough to point out.

Nested virtualization is completely broken with shadow paging on the
host even before commit 5f3d5799974b89100268ba813cec8db7bd0693fb.
Whether it works probably depends on the combination of host and guest
kernels; I am constantly using 3.10 in the guest.  It is very
reproducible, my testcase is x86/realmode.flat from kvm-unit-tests.

There are several problems, some of which were fixed along the way.  I
bisected while doing this:

- apply patch 63fbf59 (nVMX: reset rflags register cache during nested
vmentry., 2013-07-28)

- use the emulate_invalid_guest_state=0 argument to kvm-intel.  This
is fixed somewhere between commit 5f3d579 and commit 205befd (KVM:
nVMX: correctly set tr base on nested vmexit emulation, 2013-08-04); I
haven't bisected it fully, but it should not be necessary.

The resulting faulty patch is the same as yours.  The symptoms are the
same for all three cases:

- commit 5f3d579 + patch 63fbf59 + emulate_invalid_guest_state=0

- commit 205befd + emulate_invalid_guest_state=0

- commit 205befd + emulate_invalid_guest_state=1

My first impression is that a pagefault is injected erroneously, will
look more at it tomorrow.

Paolo

> In between there was also a period where first level did not lock
> up but would either seem not to schedule the second level guest or
> displayed internal error messages from starting the second level.
> 
> Given that it sounds like the current double faults in second level
> might be one of the issues introduced by the injection rework that
> remains until now while other issues were fixed from the second
> commit on.
> 
> I am not really deeply familiar with the nVMX code, just trying to
> make sense of observations. The double fault always seems to
> originate from the cmos_interrupt function in the second level
> guest. It is not immediate and sometimes took several repeated runs
> to trigger (during bisect I would require 10 successful test runs
> before marking it good). So could it maybe be some event /
> interrupt (cmos related?) that accidentally gets injected into the
> wrong guest level? Or maybe the same event taking place at the same
> time for more than one level and messing up things?
> 
> -Stefan
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html