Re: [Qemu-devel] E5-2620v2 - emulation stop error

Andrey Korolyov <andrey@xxxxxxx> · Thu, 12 Mar 2015 13:47:39 +0300

On Thu, Mar 12, 2015 at 12:59 PM, Dr. David Alan Gilbert
<dgilbert@xxxxxxxxxx> wrote:
> * Andrey Korolyov (andrey@xxxxxxx) wrote:
>> On Wed, Mar 11, 2015 at 10:59 PM, Dr. David Alan Gilbert
>> <dgilbert@xxxxxxxxxx> wrote:
>> > * Andrey Korolyov (andrey@xxxxxxx) wrote:
>> >> On Wed, Mar 11, 2015 at 10:33 PM, Dr. David Alan Gilbert
>> >> <dgilbert@xxxxxxxxxx> wrote:
>> >> > * Kevin O'Connor (kevin@xxxxxxxxxxxx) wrote:
>> >> >> On Wed, Mar 11, 2015 at 02:45:31PM -0400, Kevin O'Connor wrote:
>> >> >> > On Wed, Mar 11, 2015 at 02:40:39PM -0400, Kevin O'Connor wrote:
>> >> >> > > For what it's worth, I can't seem to trigger the problem if I move the
>> >> >> > > cmos read above the SIPI/LAPIC code (see patch below).
>> >> >> >
>> >> >> > Ugh!
>> >> >> >
>> >> >> > That's a seabios bug.  Main processor modifies the rtc index
>> >> >> > (rtc_read()) while APs try to clear the NMI bit by modifying the rtc
>> >> >> > index (romlayout.S:transition32).
>> >> >> >
>> >> >> > I'll put together a fix.
>> >> >>
>> >> >> The seabios patch below resolves the issue for me.
>> >> >
>> >> > Thanks! Looks good here.
>> >> >
>> >> > Andrey, Paolo, Bandan: Does it fix it for you as well?
>> >> >
>> >>
>> >> Thanks Kevin, Dave,
>> >>
>> >> I`m afraid that I`m hitting something different not only because
>> >> different suberror code but also because of mine version of seabios -
>> >> I am using 1.7.5 and corresponding code in the proposed patch looks
>> >> different - there is no smp-related code patch is about of. Those
>> >> mentioned devices went to production successfully and I`m afraid I
>> >> cannot afford playing on them anymore, even if I re-trigger the issue
>> >> with patched 1.8.1-rc, there is no way to switch to a different kernel
>> >> and retest due to specific conditions of this production suite. I`ve
>> >> ordered a pair of new shoes^W 2620v2-s which should arrive to me next
>> >
>> > Well I was testing on a pair of 'E5-2620 v2'; but as you saw my test case
>> > was pretty simple.  If you can suggest any flags I should add etc to the
>> > test I'd be happy to give it a go.
>> >
>> > Dave
>>
>> Here is mine launch string:
>>
>> qemu-system-x86_64 -enable-kvm -name vmtest -S -machine
>> pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -m 512
>> -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -numa
>> node,nodeid=0,cpus=0-11,mem=512 -nographic -no-user-config -nodefaults
>> -device sga -rtc base=utc,driftfix=slew -global
>> kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global
>> PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on
>> -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x4 -device
>> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -m
>> 512,slots=31,maxmem=16384M -object
>> memory-backend-ram,id=mem0,size=512M -device
>> pc-dimm,id=dimm0,node=0,memdev=mem0
>>
>> I omitted disk backend in this example, but there is a chance that my
>> problem is not reproducible without some calls made explicitly by a
>> bootloader (not sure what to say for mid-runtime failures).
>
> It seems to survive OK:

Thanks David, I`ll go through test sequence and report. Unfortunately
my orchestration does not have even a hundred millisecond precision
for libvirt events, so I can`t tell if the immediate start-up failures
happened before bootloader execution or during it, all I have for
those is a less-than-two-second interval between actual pass of a
launch command and paused state event. QEMU logging also does not give
me timestamps for an emulation errors even with appropriate timestamp
arg.

>
> while true; do (sleep 1; echo -e '\001cc\n'; sleep 5; echo -e 'q\n')|/opt/qemu-try-world3/bin/qemu-system-x86_64 -enable-kvm -name vmtest -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -m 512 -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -numa node,nodeid=0,cpus=0-11,mem=512 -nographic -no-user-config  -device sga -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -m 512,slots=31,maxmem=16384M -object memory-backend-ram,id=mem0,size=512M -device pc-dimm,id=dimm0,node=0,memdev=mem0  ~/pi.vfd 2>&1 | tee /tmp/qemu.op; grep "internal error" /tmp/qemu.op -q && break; done
>
> Dave
>
>>
>> >
>> >> Monday, so I`ll be able to test a) against 1.8.0-release, b) against
>> >> patched bios code, c) reproduce initial error on master/3.19 (may be
>> >> I`ll take them before weekend by going into this computer shop in
>> >> person). Until then, I have a very deep feeling that mine issue is not
>> >> there :) Also I became very curious on how a lack of IDT feature may
>> >> completely eliminate the issue appearance for me, the only possible
>> >> explanation is a clock-related race which is kinda stupid suggestion
>> >> and unlikely to exist in nature.
>> >>
>> >> Thanks again for everyone for throughout testing and ideas!
>> >>
>> >> >
>> >> >> -Kevin
>> >> >>
>> >> >>
>> >> >> --- a/src/romlayout.S
>> >> >> +++ b/src/romlayout.S
>> >> >> @@ -22,7 +22,8 @@
>> >> >>  // %edx = return location (in 32bit mode)
>> >> >>  // Clobbers: ecx, flags, segment registers, cr0, idt/gdt
>> >> >>          DECLFUNC transition32
>> >> >> -transition32_for_smi:
>> >> >> +transition32_nmi_off:
>> >> >> +        // transition32 when NMI and A20 are already initialized
>> >> >>          movl %eax, %ecx
>> >> >>          jmp 1f
>> >> >>  transition32:
>> >> >> @@ -205,7 +206,7 @@ __farcall16:
>> >> >>  entry_smi:
>> >> >>          // Transition to 32bit mode.
>> >> >>          movl $1f + BUILD_BIOS_ADDR, %edx
>> >> >> -        jmp transition32_for_smi
>> >> >> +        jmp transition32_nmi_off
>> >> >>          .code32
>> >> >>  1:      movl $BUILD_SMM_ADDR + 0x8000, %esp
>> >> >>          calll _cfunc32flat_handle_smi - BUILD_BIOS_ADDR
>> >> >> @@ -216,8 +217,10 @@ entry_smi:
>> >> >>          DECLFUNC entry_smp
>> >> >>  entry_smp:
>> >> >>          // Transition to 32bit mode.
>> >> >> +        cli
>> >> >> +        cld
>> >> >>          movl $2f + BUILD_BIOS_ADDR, %edx
>> >> >> -        jmp transition32
>> >> >> +        jmp transition32_nmi_off
>> >> >>          .code32
>> >> >>          // Acquire lock and take ownership of shared stack
>> >> >>  1:      rep ; nop
>> >> > --
>> >> > Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
>> > --
>> > Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html