Hi Kevin & Paolo, Luckily, I reproduced this problem last night. And I got the below log when SeaBIOS is stuck. [BTW, the whole SeaBIOS log attached] [2015-12-18 10:38:10] >>>>>gonglei: enter smp_setup()... [2015-12-18 10:38:10] >>>>>gonglei: begine to enable local APIC... [2015-12-18 10:38:10] >>>>>gonglei: finish enable local APIC... [2015-12-18 10:38:10] >>>gonglei: cmos_smp_count=8 [2015-12-18 10:38:10] >>> enter handle_smp... [2015-12-18 10:38:10] handle_smp: apic_id=5 [2015-12-18 10:38:10] ===: CountCPUs=2, SMPStack=0x6d84 [2015-12-18 10:38:10] >>> enter handle_smp... [2015-12-18 10:38:10] handle_smp: apic_id=7 [2015-12-18 10:38:10] ===: CountCPUs=3, SMPStack=0x6d84 [2015-12-18 10:38:10] >>> enter handle_smp... [2015-12-18 10:38:10] handle_smp: apic_id=1 [2015-12-18 10:38:10] ===: CountCPUs=4, SMPStack=0x6d84 [2015-12-18 10:38:10] >>> enter handle_smp... [2015-12-18 10:38:10] handle_smp: apic_id=2 [2015-12-18 10:38:10] ===: CountCPUs=5, SMPStack=0x6d84 [2015-12-18 10:38:10] >>> enter handle_smp... [2015-12-18 10:38:10] handle_smp: apic_id=4 [2015-12-18 10:38:10] ===: CountCPUs=6, SMPStack=0x6d84 [2015-12-18 10:38:10] >>> enter handle_smp... [2015-12-18 10:38:10] handle_smp: apic_id=3 [2015-12-18 10:38:10] ===: CountCPUs=7, SMPStack=0x6d84 [2015-12-18 10:38:10] >>> enter handle_smp... [2015-12-18 10:38:10] handle_smp: apic_id=6 [2015-12-18 10:38:10] ===: CountCPUs=8, SMPStack=0x6d84 [2015-12-18 10:38:10] gonglei: finish while [pid 31509 is a vcpu thread used 100% cpu overhead] # cat /proc/31509/stack [<ffffffffa05c337c>] vmx_vcpu_run+0x35c/0x580 [kvm_intel] [<ffffffffa06b0f10>] em_push+0x0/0x20 [kvm] [<ffffffffa06a30dc>] x86_emulate_instruction+0x20c/0x440 [kvm] [<ffffffffa05c9224>] handle_exception+0xe4/0x1b58 [kvm_intel] [<ffffffffa06a24c5>] vcpu_enter_guest+0x565/0x790 [kvm] [<ffffffffa05bf990>] vmx_get_segment_base+0x0/0xb0 [kvm_intel] [<ffffffffa06a2888>] __vcpu_run+0x198/0x260 [kvm] [<ffffffffa06a3508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] [<ffffffffa068f92e>] vcpu_load+0x4e/0x80 [kvm] [<ffffffffa068fcee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] [<ffffffff8109618d>] futex_wake+0xfd/0x110 [<ffffffff811ed57c>] security_file_permission+0x1c/0xa0 [<ffffffff8116bedb>] do_vfs_ioctl+0x8b/0x3b0 [<ffffffff8116c2a1>] sys_ioctl+0xa1/0xb0 [<ffffffff81469272>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff And kvm tracing information: <...>-31509 [035] 154753.180077: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306 <...>-31509 [035] 154753.180077: kvm_emulate_insn: 0:3:f0 53 (real) <...>-31509 [035] 154753.180077: kvm_inj_exception: #UD (0x0) <...>-31509 [035] 154753.180077: kvm_entry: vcpu 0 <...>-31509 [035] 154753.180078: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306 <...>-31509 [035] 154753.180078: kvm_emulate_insn: 0:3:f0 53 (real) <...>-31509 [035] 154753.180079: kvm_inj_exception: #UD (0x0) <...>-31509 [035] 154753.180079: kvm_entry: vcpu 0 <...>-31509 [035] 154753.180079: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306 <...>-31509 [035] 154753.180080: kvm_emulate_insn: 0:3:f0 53 (real) <...>-31509 [035] 154753.180080: kvm_inj_exception: #UD (0x0) <...>-31509 [035] 154753.180080: kvm_entry: vcpu 0 <...>-31509 [035] 154753.180081: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306 <...>-31509 [035] 154753.180081: kvm_emulate_insn: 0:3:f0 53 (real) <...>-31509 [035] 154753.180081: kvm_inj_exception: #UD (0x0) <...>-31509 [035] 154753.180081: kvm_entry: vcpu 0 <...>-31509 [035] 154753.180082: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306 <...>-31509 [035] 154753.180083: kvm_emulate_insn: 0:3:f0 53 (real) <...>-31509 [035] 154753.180083: kvm_inj_exception: #UD (0x0) <...>-31509 [035] 154753.180083: kvm_entry: vcpu 0 <...>-31509 [035] 154753.180084: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306 <...>-31509 [035] 154753.180084: kvm_emulate_insn: 0:3:f0 53 (real) <...>-31509 [035] 154753.180084: kvm_inj_exception: #UD (0x0) <...>-31509 [035] 154753.180084: kvm_entry: vcpu 0 <...>-31509 [035] 154753.180085: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306 <...>-31509 [035] 154753.180085: kvm_emulate_insn: 0:3:f0 53 (real) <...>-31509 [035] 154753.180085: kvm_inj_exception: #UD (0x0) <...>-31509 [035] 154753.180085: kvm_entry: vcpu 0 <...>-31509 [035] 154753.180086: kvm_exit: reason EXCEPTION_NMI rip 0x3 info 0 80000306 Now, it's very clear that the guest stuck in yiled(), and then kvm encounter the exception #UD. Do you have any thoughts? Thanks! The Seabios patch below: diff --git a/roms/seabios/src/boot.c b/roms/seabios/src/boot.c index f23e9e1..552914a 100644 --- a/roms/seabios/src/boot.c +++ b/roms/seabios/src/boot.c @@ -93,7 +93,7 @@ glob_prefix(const char *glob, const char *str) static int find_prio(const char *glob) { - dprintf(1, "Searching bootorder for: %s\n", glob); + //dprintf(1, "Searching bootorder for: %s\n", glob); int i; for (i = 0; i < BootorderCount; i++) if (glob_prefix(glob, Bootorder[i])) diff --git a/roms/seabios/src/fw/smp.c b/roms/seabios/src/fw/smp.c index a466ea6..46ec607 100644 --- a/roms/seabios/src/fw/smp.c +++ b/roms/seabios/src/fw/smp.c @@ -46,12 +46,16 @@ int apic_id_is_present(u8 apic_id) return !!(FoundAPICIDs[apic_id/32] & (1ul << (apic_id % 32))); } +// Atomic lock for shared stack across processors. +u32 SMPLock __VISIBLE; +u32 SMPStack __VISIBLE; + void VISIBLE32FLAT handle_smp(void) { if (!CONFIG_QEMU) return; - + dprintf(1, ">>> enter handle_smp...\n"); // Enable CPU caching setcr0(getcr0() & ~(CR0_CD|CR0_NW)); @@ -70,19 +74,16 @@ handle_smp(void) FoundAPICIDs[apic_id/32] |= (1 << (apic_id % 32)); CountCPUs++; + dprintf(1, "===: CountCPUs=%d, SMPStack=0x%x\n", CountCPUs, SMPStack); } -// Atomic lock for shared stack across processors. -u32 SMPLock __VISIBLE; -u32 SMPStack __VISIBLE; - // find and initialize the CPUs by launching a SIPI to them void smp_setup(void) { if (!CONFIG_QEMU) return; - + dprintf(1, ">>>>>gonglei: enter smp_setup()...\n"); ASSERT32FLAT(); u32 eax, ebx, ecx, cpuid_features; cpuid(1, &eax, &ebx, &ecx, &cpuid_features); @@ -106,7 +107,7 @@ smp_setup(void) u64 new = (0xea | ((u64)SEG_BIOS<<24) | (((u32)entry_smp - BUILD_BIOS_ADDR) << 8)); *(u64*)BUILD_AP_BOOT_ADDR = new; - + dprintf(1, ">>>>>gonglei: begine to enable local APIC...\n"); // enable local APIC u32 val = readl(APIC_SVR); writel(APIC_SVR, val | APIC_ENABLED); @@ -125,10 +126,21 @@ smp_setup(void) writel(APIC_ICR_LOW, 0x000C4500); u32 sipi_vector = BUILD_AP_BOOT_ADDR >> 12; writel(APIC_ICR_LOW, 0x000C4600 | sipi_vector); - + dprintf(1, ">>>>>gonglei: finish enable local APIC...\n"); // Wait for other CPUs to process the SIPI. u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; - while (cmos_smp_count != CountCPUs) + dprintf(1, ">>>gonglei: cmos_smp_count=%d\n", cmos_smp_count); + while (cmos_smp_count != CountCPUs) { asm volatile( // Release lock and allow other processors to use the stack. " movl %%esp, %1\n" @@ -139,8 +151,11 @@ smp_setup(void) " jc 1b\n" : "+m" (SMPLock), "+m" (SMPStack) : : "cc", "memory"); + //yield(); + } + dprintf(1, " gonglei: finish while\n"); yield(); - + dprintf(1, " gonglei: finish yield\n"); // Restore memory. *(u64*)BUILD_AP_BOOT_ADDR = old; diff --git a/roms/seabios/src/misc.c b/roms/seabios/src/misc.c index 8caaf31..77f6be3 100644 --- a/roms/seabios/src/misc.c +++ b/roms/seabios/src/misc.c @@ -64,6 +64,10 @@ void VISIBLE16 handle_02(void) { debug_isr(DEBUG_ISR_02); + dprintf(1, "gonglei hand nmi inject, write rtc \n"); } void diff --git a/roms/seabios/src/stacks.c b/roms/seabios/src/stacks.c index 1dbdfe9..c1b5203 100644 --- a/roms/seabios/src/stacks.c +++ b/roms/seabios/src/stacks.c @@ -174,6 +174,7 @@ call16_smm(u32 eax, u32 edx, void *func) static void call32_sloppy_prep(void) { + dprintf(1, ">>> enter call32_sloppy_prep...\n"); // Backup cmos index register and disable nmi u8 cmosindex = inb(PORT_CMOS_INDEX); outb(cmosindex | NMI_DISABLE_BIT, PORT_CMOS_INDEX); Regards, -Gonglei > -----Original Message----- > From: Kevin O'Connor [mailto:kevin@xxxxxxxxxxxx] > Sent: Thursday, November 19, 2015 9:41 PM > To: Xulei (Stone) > Cc: Gonglei (Arei); qemu-devel; seabios@xxxxxxxxxxx; Huangweidong (C) > Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy > problem on qemu-kvm platform > > On Thu, Nov 19, 2015 at 12:42:50PM +0000, Xulei (Stone) wrote: > > Kevin, > > > > After deeply analyzing, i think there may be 3 possible reasons: > > 1)wrong CountCPUs value. It seems CountCPUs++ in handle_smp() has no > > lock to protect. So, sometimes, 2 or more vcpu may get the same > > current value of CountCPUs. Then we'll get a single incrementation > > instead of 2 or more and "while (cmos_smp_count != CountCPUs)" will > > loop forever; > > The handle_smp() code is called from romlayout.S:entry_smp() which does take > a lock. So, all of handle_smp() should run synchronous. > > > 2)wrong cmos_smp_count value. SeaBIOS rtc reads an incorrect number? > > Not sure - the last time there were problems in this area of the code others > used kvmtrace to try and track this down. Since you are getting dprintf > statements, you could also try outputting cmos_smp_count prior to the loop > (see patch below). > > > 3)yield() stuck. Is it possible that SeaBIOS is stuck during yield? > > I've tested, when yield() is running, SeaBIOS seems has not created > > some other threads except the main thread. So I don't know what's the > > function of yield() here.? > > The yield() allows hardware interrupts to occur. But note that > yield() isn't called in the loop - is is only called after the loop completes. > > If you are only getting this on massive repetitive reboot requests, there are > some other possible explanations: > > - perhaps the SIPI is getting lost because one of the CPUs is still > resetting or still processing a SIPI from the last reboot? > > - the seabios code itself may have been corrupted if the memcpy() in > qemu_prep_reset() got far enough along to clear HaveRunPost, but did > not get far enough along to fully complete the memcpy(). > > If the failure is reproducible, the patch below could help narrow the > possibilities. > > -Kevin > > > --- a/src/fw/smp.c > +++ b/src/fw/smp.c > @@ -125,6 +125,7 @@ smp_setup(void) > > // Wait for other CPUs to process the SIPI. > u8 cmos_smp_count = rtc_read(CMOS_BIOS_SMP_COUNT) + 1; > + dprintf(1, "cmos_smp_count=%d\n", cmos_smp_count); > while (cmos_smp_count != CountCPUs) > asm volatile( > // Release lock and allow other processors to use the stack. > @@ -136,6 +137,7 @@ smp_setup(void) > " jc 1b\n" > : "+m" (SMPLock), "+m" (SMPStack) > : : "cc", "memory"); > + dprintf(1, "finish smp\n"); > yield(); > > // Restore memory. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html