On 2012-08-17 16:41, Jan Kiszka wrote: > On 2012-08-17 16:36, Jan Kiszka wrote: >> On 2012-08-17 15:11, Jan Kiszka wrote: >>> On 2012-08-06 17:11, Stefan Hajnoczi wrote: >>>> On Thu, Jun 28, 2012 at 2:05 PM, Peter Lieven <pl@xxxxxxxxx> wrote: >>>>> i debugged my initial problem further and found out that the problem happens >>>>> to be that >>>>> the main thread is stuck in pause_all_vcpus() on reset or quit commands in >>>>> the monitor >>>>> if one cpu is stuck in the do-while loop kvm_cpu_exec. If I modify the >>>>> condition from while (ret == 0) >>>>> to while ((ret == 0) && !env->stop); it works, but is this the right fix? >>>>> "Quit" command seems to work, but on "Reset" the VM enterns pause state. >>>> >>>> I think I'm hitting something similar. I installed a F17 amd64 guest >>>> (3.5 kernel) but before booting entered the GRUB boot menu edit mode. >>>> The guest seemed unresponsive so I switched to the monitor, which also >>>> froze shortly afterwards. The VNC screen ended up being all black. >>>> >>>> qemu-kvm.git/master 3e4305694fd891b69e4450e59ec4c65420907ede >>>> Linux 3.2.0-3-amd64 from Debian testing >>>> >>>> $ qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -drive >>>> if=virtio,cache=none,file=f17.img,aio=native -serial stdio >>>> >>>> (gdb) thread apply all bt >>>> >>>> Thread 3 (Thread 0x7f8008e23700 (LWP 367)): >>>> #0 0x00007f800f891727 in ioctl () at ../sysdeps/unix/syscall-template.S:82 >>>> #1 0x00007f80137b92c9 in kvm_vcpu_ioctl >>>> (env=env@entry=0x7f8015b49640, type=type@entry=44672) >>>> at /home/stefanha/qemu-kvm/kvm-all.c:1619 >>>> #2 0x00007f80137b93fe in kvm_cpu_exec (env=env@entry=0x7f8015b49640) >>>> at /home/stefanha/qemu-kvm/kvm-all.c:1506 >>>> #3 0x00007f8013766f31 in qemu_kvm_cpu_thread_fn (arg=0x7f8015b49640) >>>> at /home/stefanha/qemu-kvm/cpus.c:756 >>>> #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at >>>> pthread_create.c:304 >>>> #5 0x00007f800f8986dd in clone () at >>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>>> #6 0x0000000000000000 in ?? () >>>> >>>> This vcpu is still executing guest code and I've seen it successfully >>>> dispatching I/O. The problem is it's missing the exit_request... >>>> >>>> Thread 2 (Thread 0x7f8008622700 (LWP 368)): >>>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>>> #1 0x00007f801372b229 in qemu_cond_wait (cond=<optimized out>, >>>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>>> #2 0x00007f8013766eff in qemu_kvm_wait_io_event (env=<optimized out>) >>>> at /home/stefanha/qemu-kvm/cpus.c:724 >>>> #3 qemu_kvm_cpu_thread_fn (arg=0x7f8015b67450) at >>>> /home/stefanha/qemu-kvm/cpus.c:761 >>>> #4 0x00007f800fb4db50 in start_thread (arg=<optimized out>) at >>>> pthread_create.c:304 >>>> #5 0x00007f800f8986dd in clone () at >>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>>> #6 0x0000000000000000 in ?? () >>>> >>>> No problems here. >>>> >>>> Thread 1 (Thread 0x7f801347b8c0 (LWP 365)): >>>> #0 pthread_cond_wait@@GLIBC_2.3.2 () >>>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>>> #1 0x00007f801372b229 in qemu_cond_wait (cond=cond@entry=0x7f801402fd80, >>>> mutex=mutex@entry=0x7f80144367c0) at qemu-thread-posix.c:113 >>>> #2 0x00007f8013768949 in pause_all_vcpus () at >>>> /home/stefanha/qemu-kvm/cpus.c:962 >>>> #3 0x00007f80136028c8 in main (argc=<optimized out>, argv=<optimized out>, >>>> envp=<optimized out>) at /home/stefanha/qemu-kvm/vl.c:3695 >>>> >>>> We're deadlocked in pause_all_vcpus(), waiting for vcpu #0 to pause. >>>> Unfortunately vcpu #0 has ->exit_request=0 although ->stop=1. >>>> >>>> Here are the vcpus: >>>> >>>> (gdb) p first_cpu >>>> $6 = (struct CPUX86State *) 0x7f8015b49640 >>>> (gdb) p first_cpu->next_cpu >>>> $7 = (struct CPUX86State *) 0x7f8015b67450 >>>> (gdb) p first_cpu->next_cpu->next_cpu >>>> $8 = (struct CPUX86State *) 0x0 >>>> >>>> (gdb) p first_cpu->stop >>>> $9 = 1 >>>> (gdb) p first_cpu->stopped >>>> $10 = 0 >>>> (gdb) p first_cpu->exit_request >>>> $11 = 0 >>> >>> CPUState::exit_request is only set on specific synchronous events, see >>> target-i386/kvm.c. >>> >>> More interesting is CPUState::thread_kicked. If it's set, qemu_cpu_kick >>> will skip the kicking via a signal. Maybe there is some race. Let me >>> think about such possibilities again... >> >> diff --git a/cpus.c b/cpus.c >> index e476a3c..30f3228 100644 >> --- a/cpus.c >> +++ b/cpus.c >> @@ -726,6 +726,9 @@ static void qemu_kvm_wait_io_event(CPUArchState *env) >> } >> >> qemu_kvm_eat_signals(env); >> + /* Ensure that checking env->stop cannot overtake signal processing so >> + * that we lose the latter without stopping. */ >> + smp_rmb(); > > rmb is nonsense. Should be a plain barrier() - if at all. > >> qemu_wait_io_event_common(env); >> } >> >> Can anyone imagine that such a barrier may actually be required? If it >> is currently possible that env->stop is evaluated before we called into >> sigtimedwait in qemu_kvm_eat_signals, then we could actually eat the >> signal without properly processing its reason (stop). Should not be required (TM): Both signal eating / stop checking and stop setting / signal generation happens under the BQL, thus the ordering must not make a difference here. Don't see where we could lose a signal. Maybe due to a subtle memory corruption that sets thread_kicked to non-zero, preventing the kicking this way. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html