Re: VMX: System lock-up in guest mode, BIOS under suspect

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 02.10.2010 19:25, Avi Kivity wrote:
>   On 10/01/2010 06:30 PM, Jan Kiszka wrote:
>> Hi,
>>
>> for the past days I've been trying to understand a very strange hard
>> lock-up of some Intel i7 boxes when running our 16-bit guest OS under
>> KVM. After applying some instrumentation before and after the VM entry
>> (e.g. direct write to VGA memory), it turned out that the system is
>> apparently stuck inside guest mode!
> 
> Strictly speaking, it could also be a crash in the small window between 
> vmexit and your writes.  However it's likely to be as you say.
> 
>> I double-checked that VM exits on external IRQs and NMIs are properly
>> enabled in the VMCS - they are. I also tried to capture any potential
>> last words via serial console and even via remote DMA over Firewire) -
>> nothing. This likely means that not only the one core in guest mode is
>> stuck but all the others as well (note: the freeze is reproducible both
>> in UP and SMP mode). Very uncommon for an OS crash I would say...
>>
>> So I decided to go for some nice conspiracy theory and put SMIs and
>> related BIOS code under suspect. Interestingly, this worked out:
>>
>> After disabling all SMIs on my box (Fujitsu Celsius H700) via the
>> chipset register, the hard freezes no longer occurred up to now. My
>> customer was able to confirm this on some Lenovo Notebook as well. We
>> are currently collecting data about the affected systems to correlate
>> it, and we are performing longer test runs.
>>
>> Nevertheless, I would like to collect some first comments on this. I'm
>> specifically wondering...
>>
>>   - if there is anything the host OS can mess up to make VM exits crash
>>     on the way into SMM or out again (I cannot imagine as the SMM monitor
>>     should always be able to run, at least in the absence of CPU
>>     erratas).
> 
> Yes.  It's basically a small hypervisor, and the host OS is its guest.  
> So a well written SMM handler should not depend on any OS setting.  
> Whether they're actually tested this way is another matter.
> 
>>   - what the SMM monitor could do wrong to cause such a crash,
>>     especially as it looks like the hardware does all the switching for
>>     it.
> 
> Looks like SMM saves some handler-visible state when EPT is enabled.  
> Are all your failures on EPT-capable hosts?  If so, what happens when 
> EPT is disabled?

All Core i7 should support EPT, so we should have this enabled on all
affected systems. However, ept=0 makes no difference on my box, it still
locks up.

> 
>>   - if there could still be some KVM crash around host<->guest switching
>>     that just happens to be triggered by the SMI noise and that affects
>>     the whole system (including cores that do not host KVM threads).
>>
>> Any ideas warmly welcome!
> 
> Besides trying with ept=0, I suggest looking for machines that have SMIs 
> but do not crash.  If we find them, this seems to indicate a badly 
> written SMM handler.  If not, then there may be a systemic problem with 
> kvm (or perhaps all SMM handlers are badly written).

We are looking for the BIOS vendors. In my case, it is Phoenix, but at
least the Lenovos have been re-branded.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux