Re: EPT: Misconfiguration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 26, 2011 at 10:52, Avi Kivity <avi@xxxxxxxxxx> wrote:
> On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:
>>
>> > ÂWhen you say "suddenly", this was with no changes to software and
>> > hardware?
>>
>> The host software and hardware hasn't changed in the two months since
>> the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.
>>
>> We host customer vms on it though, so virtual machines come and go.
>> Various operating systems, a mixture of Linux, FreeBSD and Windows
>> 2008 R2. We have other machines with the same config without these
>> problems though.
>
> Are those other machines running a similar workload?

Yes, similar, or they're more heavily loaded.

On this machine, about half of the 48GB memory was used for virtual machines.

> The traces look awfully like bad hardware, though that can also be explained
> by random memory corruption due to a bug.

Yeah, that's what I'm expecting. We already replaced the memory, next
step is to move the disks over to another server to make sure it's not
the board or cpu's.

>> This time I have a few different messages though:
>>
>> 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault:
>> 0000 [#1] SMP
>>
>> RSI: 0000000000000000 RDI: 1603a07305001568
>>
>> 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
>> 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
>> 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00<f0> Âff 4f 08 0f 94 c0 84
>> c0 74 10 85 f6 75 07 e8 63 fe ff ff eb
>
> lock decl 0x8(%rdi)
>
> %rdi is completely crap, looks like corruption again. ÂStrangely, it is
> similar to the bad spte from the previous trace: 0x1603a0730500d277. ÂThe
> upper 48 bits are identical, the lower 16 bits are different.:
>>
>> 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
>> page table at address 7f37b37ff000
>> 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
>> 94e538067 PMD 61e5bf067 PTE 1603a0730500e067
>
> Here are those magic 48 bits again, in the PTE entry.
>>
>> 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
>> 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
>> 2011-01-25T12:38:49.417526+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
>> 2011-01-25T12:38:49.417532+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x5db595007 level 3
>> 2011-01-25T12:38:49.417553+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
>> 2011-01-25T12:38:49.417558+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1
>
> Again.
>
>> 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
>> process qemu-kvm Âpte:1603a0730500d067 pmd:61059f067
>
> Again.
>
> However, these all came from a single boot, yes?

Correct.

> If so they can be the same
> corruption. ÂPlease collect more traces, with reboots in between.

Ok, thanks, will do.

Kind regards,

Ruben
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux