* Andrey Korolyov (andrey@xxxxxxx) wrote: > On Wed, Mar 11, 2015 at 10:59 PM, Dr. David Alan Gilbert > <dgilbert@xxxxxxxxxx> wrote: > > * Andrey Korolyov (andrey@xxxxxxx) wrote: > >> On Wed, Mar 11, 2015 at 10:33 PM, Dr. David Alan Gilbert > >> <dgilbert@xxxxxxxxxx> wrote: > >> > * Kevin O'Connor (kevin@xxxxxxxxxxxx) wrote: > >> >> On Wed, Mar 11, 2015 at 02:45:31PM -0400, Kevin O'Connor wrote: > >> >> > On Wed, Mar 11, 2015 at 02:40:39PM -0400, Kevin O'Connor wrote: > >> >> > > For what it's worth, I can't seem to trigger the problem if I move the > >> >> > > cmos read above the SIPI/LAPIC code (see patch below). > >> >> > > >> >> > Ugh! > >> >> > > >> >> > That's a seabios bug. Main processor modifies the rtc index > >> >> > (rtc_read()) while APs try to clear the NMI bit by modifying the rtc > >> >> > index (romlayout.S:transition32). > >> >> > > >> >> > I'll put together a fix. > >> >> > >> >> The seabios patch below resolves the issue for me. > >> > > >> > Thanks! Looks good here. > >> > > >> > Andrey, Paolo, Bandan: Does it fix it for you as well? > >> > > >> > >> Thanks Kevin, Dave, > >> > >> I`m afraid that I`m hitting something different not only because > >> different suberror code but also because of mine version of seabios - > >> I am using 1.7.5 and corresponding code in the proposed patch looks > >> different - there is no smp-related code patch is about of. Those > >> mentioned devices went to production successfully and I`m afraid I > >> cannot afford playing on them anymore, even if I re-trigger the issue > >> with patched 1.8.1-rc, there is no way to switch to a different kernel > >> and retest due to specific conditions of this production suite. I`ve > >> ordered a pair of new shoes^W 2620v2-s which should arrive to me next > > > > Well I was testing on a pair of 'E5-2620 v2'; but as you saw my test case > > was pretty simple. If you can suggest any flags I should add etc to the > > test I'd be happy to give it a go. > > > > Dave > > Here is mine launch string: > > qemu-system-x86_64 -enable-kvm -name vmtest -S -machine > pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -m 512 > -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -numa > node,nodeid=0,cpus=0-11,mem=512 -nographic -no-user-config -nodefaults > -device sga -rtc base=utc,driftfix=slew -global > kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global > PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on > -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x4 -device > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -m > 512,slots=31,maxmem=16384M -object > memory-backend-ram,id=mem0,size=512M -device > pc-dimm,id=dimm0,node=0,memdev=mem0 > > I omitted disk backend in this example, but there is a chance that my > problem is not reproducible without some calls made explicitly by a > bootloader (not sure what to say for mid-runtime failures). It seems to survive OK: while true; do (sleep 1; echo -e '\001cc\n'; sleep 5; echo -e 'q\n')|/opt/qemu-try-world3/bin/qemu-system-x86_64 -enable-kvm -name vmtest -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -m 512 -realtime mlock=off -smp 12,sockets=1,cores=12,threads=12 -numa node,nodeid=0,cpus=0-11,mem=512 -nographic -no-user-config -device sga -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -m 512,slots=31,maxmem=16384M -object memory-backend-ram,id=mem0,size=512M -device pc-dimm,id=dimm0,node=0,memdev=mem0 ~/pi.vfd 2>&1 | tee /tmp/qemu.op; grep "internal error" /tmp/qemu.op -q && break; done Dave > > > > >> Monday, so I`ll be able to test a) against 1.8.0-release, b) against > >> patched bios code, c) reproduce initial error on master/3.19 (may be > >> I`ll take them before weekend by going into this computer shop in > >> person). Until then, I have a very deep feeling that mine issue is not > >> there :) Also I became very curious on how a lack of IDT feature may > >> completely eliminate the issue appearance for me, the only possible > >> explanation is a clock-related race which is kinda stupid suggestion > >> and unlikely to exist in nature. > >> > >> Thanks again for everyone for throughout testing and ideas! > >> > >> > > >> >> -Kevin > >> >> > >> >> > >> >> --- a/src/romlayout.S > >> >> +++ b/src/romlayout.S > >> >> @@ -22,7 +22,8 @@ > >> >> // %edx = return location (in 32bit mode) > >> >> // Clobbers: ecx, flags, segment registers, cr0, idt/gdt > >> >> DECLFUNC transition32 > >> >> -transition32_for_smi: > >> >> +transition32_nmi_off: > >> >> + // transition32 when NMI and A20 are already initialized > >> >> movl %eax, %ecx > >> >> jmp 1f > >> >> transition32: > >> >> @@ -205,7 +206,7 @@ __farcall16: > >> >> entry_smi: > >> >> // Transition to 32bit mode. > >> >> movl $1f + BUILD_BIOS_ADDR, %edx > >> >> - jmp transition32_for_smi > >> >> + jmp transition32_nmi_off > >> >> .code32 > >> >> 1: movl $BUILD_SMM_ADDR + 0x8000, %esp > >> >> calll _cfunc32flat_handle_smi - BUILD_BIOS_ADDR > >> >> @@ -216,8 +217,10 @@ entry_smi: > >> >> DECLFUNC entry_smp > >> >> entry_smp: > >> >> // Transition to 32bit mode. > >> >> + cli > >> >> + cld > >> >> movl $2f + BUILD_BIOS_ADDR, %edx > >> >> - jmp transition32 > >> >> + jmp transition32_nmi_off > >> >> .code32 > >> >> // Acquire lock and take ownership of shared stack > >> >> 1: rep ; nop > >> > -- > >> > Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK > > -- > > Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html