On Sat, Mar 7, 2015 at 3:00 AM, Andrey Korolyov <andrey@xxxxxxx> wrote: > On Fri, Mar 6, 2015 at 7:57 PM, Bandan Das <bsd@xxxxxxxxxx> wrote: >> Andrey Korolyov <andrey@xxxxxxx> writes: >> >>> On Fri, Mar 6, 2015 at 1:14 AM, Andrey Korolyov <andrey@xxxxxxx> wrote: >>>> Hello, >>>> >>>> recently I`ve got a couple of shiny new Intel 2620v2s for future >>>> replacement of the E5-2620v1, but I experienced relatively many events >>>> with emulation errors, all traces looks simular to the one below. I am >>>> running qemu-2.1 on x86 on top of 3.10 branch for testing purposes but >>>> can switch to some other versions if necessary. Most of crashes >>>> happened during reboot cycle or at the end of ACPI-based shutdown >>>> action, if this can help. I have zero clues of what can introduce such >>>> a mess inside same processor family using identical software, as >>>> 2620v1 has no simular problem ever. Please let me know if there can be >>>> some side measures for making entire story more clear. >>>> >>>> Thanks! >>>> >>>> KVM internal error. Suberror: 2 >>>> extra data[0]: 800000d1 >>>> extra data[1]: 80000b0d >>>> EAX=00000003 EBX=00000000 ECX=00000000 EDX=00000000 >>>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006cd4 >>>> EIP=0000d3f9 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 >>>> ES =0000 00000000 0000ffff 00009300 >>>> CS =f000 000f0000 0000ffff 00009b00 >>>> SS =0000 00000000 0000ffff 00009300 >>>> DS =0000 00000000 0000ffff 00009300 >>>> FS =0000 00000000 0000ffff 00009300 >>>> GS =0000 00000000 0000ffff 00009300 >>>> LDT=0000 00000000 0000ffff 00008200 >>>> TR =0000 00000000 0000ffff 00008b00 >>>> GDT= 000f6e98 00000037 >>>> IDT= 00000000 000003ff >>>> CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 >>>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 >>>> DR3=0000000000000000 >>>> DR6=00000000ffff0ff0 DR7=0000000000000400 >>>> EFER=0000000000000000 >>>> Code=48 18 67 8c 00 8c d1 8e d9 66 5a 66 58 66 5d 66 c3 cd 02 cb <cd> >>>> 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb cd 19 cb cd 1c cb fa fc 66 >>>> b8 00 e0 00 00 8e >>> >>> >>> It turns out that those errors are introduced by APICv, which gets >>> enabled due to different feature set. If anyone is interested in >>> reproducing/fixing this exactly on 3.10, it takes about one hundred of >>> migrations/power state changes for an issue to appear, guest OS can be >>> Linux or Win. >> >> Are you able to reproduce this on a more recent upstream kernel as well ? >> >> Bandan > > I`ll go through test cycle with 3.18 and 2603v2 around tomorrow and > follow up with any reproduceable results. Heh.. issue is not triggered on 2603v2 at all, at least I am not able to hit this. The only difference with 2620v2 except lower frequency is an Intel Dynamic Acceleration feature. I`d appreciate any testing with higher CPU models with same or richer feature set. The testing itself can be done on both generic 3.10 or RH7 kernels, as both of them are experiencing this issue. I conducted all tests with disabled cstates so I advise to do the same for a first reproduction step. Thanks! model name : Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz stepping : 4 microcode : 0x416 cpu MHz : 2100.039 cache size : 15360 KB siblings : 12 apicid : 43 initial apicid : 43 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html