On Wed, Mar 11, 2015 at 03:53:07PM +0000, Dr. David Alan Gilbert wrote: > * Kevin O'Connor (kevin@xxxxxxxxxxxx) wrote: > > On Wed, Mar 11, 2015 at 01:45:57PM +0000, Dr. David Alan Gilbert wrote: > > > * Bandan Das (bsd@xxxxxxxxxx) wrote: > > > > "Dr. David Alan Gilbert" <dgilbert@xxxxxxxxxx> writes: > > > > > while true; do (sleep 5; echo -e '\001cq\n')|/opt/qemu-try-world3/bin/qemu-system-x86_64 -machine pc-i440fx-2.0,accel=kvm -m 1024 -smp 128 -nographic -device sga 2>&1 | tee /tmp/qemu.op; grep "internal error" /tmp/qemu.op -q && break; done That is a truly impressive command line, BTW. > > > > > [root@virtlab413 qemu-world3]# git bisect bad > > > > > 21f5826a04d38e19488f917e1eef22751490c769 is the first bad commit > > > > > > > > I can reproduce this on E5-2620 v2 with David's "while true" test. > > > > (The emulation failure I mean, not the suberror 2 that Andrey is seeing) > > > > The commit that seems to have introduced this is - > > > > > > > > commit 0673b7870063a3affbad9046fb6d385a4e734c19 > > > > Author: Kevin O'Connor <kevin@xxxxxxxxxxxx> > > > > Date: Sat May 24 10:49:50 2014 -0400 > > > > > > > > smp: Replace QEMU SMP init assembler code with C; run only in 32bit mode. > > [...] > > > Turning on debug logging > > > ( -chardev file,id=log,path=/tmp/debugcon.$$ -device isa-debugcon,chardev=log,iobase=0x402 ) > > > > > > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org) > > [...] > > > Found 1 cpu(s) max supported 128 cpu(s) > > > > Something is very odd here. When I run the above command (on an older > > AMD machine) I get: > > > > Found 128 cpu(s) max supported 128 cpu(s) > > > > That first value (1 vs 128) comes from QEMU (via cmos index 0x5f). > > That is, during smp init, SeaBIOS expects QEMU to tell it how many > > cpus are active, and SeaBIOS waits until that many CPUs check in from > > its SIPI request before proceeding. > > > > I wonder if QEMU reported only 1 active cpu via that cmos register, > > but more were actually active. If that was the case, it could > > certainly explain the failure - as multiple cpus could be running > > without the sipi trapoline in place. > > > > What does the log look like on a non-failure case? > > I had to drop down from 128 to get a working run with debug; here > are two runs with -smp 20 the first one worked, the second one > failed. [...] > =========== Working =========== > > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org) [...] > Found 20 cpu(s) max supported 20 cpu(s) [...] > =========== Broken =========== > > SeaBIOS (version rel-1.8.0-0-g4c59f5d-20150219_092859-nilsson.home.kraxel.org) [...] > Found 1 cpu(s) max supported 20 cpu(s) So, I couldn't get this to fail on my older AMD machine at all with the default SeaBIOS code. But, when I change the code with the patch below, it failed right away. KVM internal error. Suberror: 1 emulation failure EAX=00000000 EBX=00000000 ECX=00000000 EDX=000fd2b8 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=000fd2c1 EFL=00000007 [-----PC] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008300 DPL=0 TSS16-busy GDT= 000f6a50 00000037 IDT= 000f6a8e 00000000 CR0=60000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=66 ba b8 d2 0f 00 e9 a2 fe f3 90 f0 0f ba 2d 04 ff fb 3f 00 <72> f3 8b 25 00 ff fb 3f e8 d2 65 ff ff c7 05 04 ff fb 3f 00 00 00 00 f4 eb fd fa fc 66 b8 And the failed debug output looks like: SeaBIOS (version rel-1.8.0-7-gd23eba6-dirty-20150311_121819-morn.localdomain) [...] cmos_smp_count0=20 [...] cmos_smp_count=1 cmos_smp_count2=1/20 Found 1 cpu(s) max supported 20 cpu(s) I'm going to check the assembly for a compiler error, but is it possible QEMU is returning incorrect data in cmos index 0x5f? David, any chance you can recompile seabios and double check your output? -Kevin --- a/src/fw/smp.c +++ b/src/fw/smp.c @@ -128,6 +128,7 @@ smp_setup(void) // Wait for other CPUs to process the SIPI. u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; + dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count); while (cmos_smp_count != CountCPUs) asm volatile( // Release lock and allow other processors to use the stack. @@ -140,6 +141,8 @@ smp_setup(void) : "+m" (SMPLock), "+m" (SMPStack) : : "cc", "memory"); yield(); + dprintf(1, "cmos_smp_count2=%d/%d\n", cmos_smp_count + , rtc_read(CMOS_BIOS_SMP_COUNT) + 1); // Restore memory. *(u64*)BUILD_AP_BOOT_ADDR = old; diff --git a/src/post.c b/src/post.c index 9ea5620..dc11c72 100644 --- a/src/post.c +++ b/src/post.c @@ -170,6 +170,7 @@ platform_hardware_setup(void) clock_setup(); // Platform specific setup + dprintf(1, "cmos_smp_count0=%d\n", rtc_read(CMOS_BIOS_SMP_COUNT) + 1); qemu_platform_setup(); coreboot_platform_setup(); } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html