Hi all,
I encountered 2 times of infinite-loop in red_worker,
Both happened after I shutdown a windows guest in 'spicy' ,
watching "windows is shutting down... " ,
then close the 'spicy' window.
Usually , qemu process should quit after a few seconds,
but there is a low chance to go infinite.
Here is the detail (long but funny, I think :)
===================== process status ==============================
top - 17:54:37 up 65 days, 8:02, 3 users, load average: 1.93, 1.69, 1.68
Tasks: 321 total, 2 running, 316 sleeping, 3 stopped, 0 zombie
Cpu(s): 3.6%us, 1.8%sy, 0.0%ni, 94.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 49401784k total, 37162056k used, 12239728k free, 600832k buffers
Swap: 39535612k total, 15148k used, 39520464k free, 28640352k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7409 root 20 0 1620m 1.1g 7660 S 100 2.4 53:03.95 qemu-system-x86 <== our qemu process (pid 7409) keeps consuming 1 cpu on host
$ ps -p 7409 -L -l
F S UID PID PPID LWP C PRI NI ADDR SZ WCHAN TTY TIME CMD
6 S 0 7409 1 7409 0 80 0 - 414790 unix_s pts/1 00:00:15 qemu-system-x86
3 S 0 7409 1 7431 0 80 0 - 414790 kvm_vc pts/1 00:00:22 qemu-system-x86
3 S 0 7409 1 7432 0 80 0 - 414790 kvm_vc pts/1 00:00:21 qemu-system-x86
3 R 0 7409 1 7433 97 80 0 - 414790 ? pts/1 02:07:27 qemu-system-x86 <== one thread goes mad :)
$ sudo netstat --tcp -np | grep 7409
tcp 1 0 172.168.12.23:10088 172.168.12.168:33085 CLOSE_WAIT 7409/qemu-system-x8 <== one of the spice channels failed to clean up
$ sudo netstat -nlp | grep 7409
tcp 0 0 0.0.0.0:10088 0.0.0.0:* LISTEN 7409/qemu-system-x8
unix 2 [ ACC ] STREAM LISTENING 6310958 7409/qemu-system-x8 /home/grizzlybears/q131/dig_into/qmon_win7qq
================ full qemu command line and version =========================
/home/grizzlybears/q131/qemu/x86_64-softmmu/qemu-system-x86_64
-enable-kvm -L /home/grizzlybears/q131/qemu-kvm-build/usr/share/qemu -nodefconfig -nodefaults
-smp 2,sockets=2,cores=8,threads=1 -m 1024
-drive if=virtio,aio=native,file=/home/grizzlybears/images/win7_qq.qcow2
-spice port=10088,addr=0.0.0.0,disable-ticketing,image-compression=auto_glz,streaming-video=filter,disable-copy-paste
-vga qxl
-soundhw hda
-readconfig /home/grizzlybears/q131/qemu/docs/ich9-ehci-uhci.cfg
-chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0
-device usb-tablet,id=input0,bus=ehci.0
-netdev tap,id=nic1,script=/home/grizzlybears/q131/qemu-ifup,downscript=/home/grizzlybears/q131/qemu-ifdown -device virtio-net-pci,netdev=nic1,mac=52:54:00:fb:d5:88
-balloon none -no-hpet -rtc base=localtime,driftfix=none -global kvm-pit.lost_tick_policy=discard
-chardev socket,id=charmonitor,path=/home/grizzlybears/q131/dig_into/qmon_win7qq,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline
BTW, my source version:
qemu 1.3.1
spice 0.12.3
============== then profiling ======================
I use 'perf' to inspect qemu, and found something interesting.
1 # Events: 78K cycles
2 #
3 # Overhead Command Shared Object Symbol
4 # ........ ............... .................... ..............................................
5 #
6 9.73% qemu-system-x86 [kernel.kallsyms] [k] fget_light
7 |
8 --- fget_light
9 |
10 |--97.60%-- sys_ioctl
11 | system_call_fastpath
12 | __ioctl
13 | |
14 | --100.00%-- spice_timer_queue_cb
15 | red_worker_main
16 | start_thread
17 |
18 --2.40%-- system_call_fastpath
19 __ioctl
20 |
21 --100.00%-- spice_timer_queue_cb
22 red_worker_main
23 start_thread
24
25 6.47% qemu-system-x86 qemu-system-x86_64 [.] ring_add
26 |
27 --- ring_add
28 |
29 |--96.68%-- ring_add_before
30 | _spice_timer_set
31 | |
32 | |--52.97%-- spice_timer_set
33 | | red_channel_client_start_ping_timer
34 | | red_channel_client_ping_timer
35 | | spice_timer_queue_cb
36 | | red_worker_main
37 | | start_thread
38 | |
39 | --47.03%-- spice_timer_queue_cb
...
The whole profiling graph is very long and I attached it in this msg.
From profiling graph, we can see that , almost in all the pathes ,
there is 'spice_timer_queue_cb()' and 'red_worker_main()'
============= So , let's glimpse at red_worker_main() =============
SPICE_GNUC_NORETURN void *red_worker_main(void *arg)
{
RedWorker *worker = spice_malloc(sizeof(RedWorker));
do some initialize ...
worker->event_timeout = INF_EVENT_WAIT;
for (;;) { <== big loop
int i, num_events;
unsigned int timers_queue_timeout;
timers_queue_timeout = spice_timer_queue_get_timeout_ms();
worker->event_timeout = MIN(red_get_streams_timout(worker), worker->event_timeout);
worker->event_timeout = MIN(timers_queue_timeout, worker->event_timeout);
num_events = poll(worker->poll_fds, MAX_EVENT_SOURCES, worker->event_timeout);
red_handle_streams_timout(worker);
spice_timer_queue_cb(); <== Yes, here is where cpu is actualy consumed
if (worker->display_channel) {
/* during migration, in the dest, the display channel can be initialized
while the global lz data not since migrate data msg hasn't been received yet */
red_channel_apply_clients(&worker->display_channel->common.base, red_display_cc_free_glz_drawables);
}
worker->event_timeout = INF_EVENT_WAIT;
if (num_events == -1) {
if (errno != EINTR) { spice_error("poll failed, %s", strerror(errno)); }
}
for (i = 0; i < MAX_EVENT_SOURCES; i++) {
process any 'ready' io watchers ...
}
Clear the poll_fd for any removed watches ...
if (worker->running) {
int ring_is_empty;
red_process_cursor(worker, MAX_PIPE_SIZE, &ring_is_empty);
red_process_commands(worker, MAX_PIPE_SIZE, &ring_is_empty);
}
red_push(worker);
}
abort();
}
============= gdb failed to attach to the live process, I dont know why, so I made a coredump and gdb it ==========
(gdb) thread 2
[Switching to thread 2 (Thread 7433)]#0 0x00007fa93df59c87 in ioctl () at ../sysdeps/unix/syscall-template.S:82
82 ../sysdeps/unix/syscall-template.S: No such file or directory.
in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0 0x00007fa93df59c87 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007fa9410000d3 in red_channel_client_ping_timer (opaque=0x7fa8e4070070) at red_channel.c:752
#2 0x00007fa940f45370 in spice_timer_queue_cb () at spice_timer_queue.c:262
#3 0x00007fa940f289e6 in red_worker_main (arg=0x7fffed47bd00) at red_worker.c:12265
#4 0x00007fa93ec1a9ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#5 0x00007fa93df6221d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#6 0x0000000000000000 in ?? ()
(gdb) frame 3
#3 0x00007fa940f289e6 in red_worker_main (arg=0x7fffed47bd00) at red_worker.c:12265
12265 spice_timer_queue_cb();
(gdb) l
12260 timers_queue_timeout = spice_timer_queue_get_timeout_ms();
12261 worker->event_timeout = MIN(red_get_streams_timout(worker), worker->event_timeout);
12262 worker->event_timeout = MIN(timers_queue_timeout, worker->event_timeout);
12263 num_events = poll(worker->poll_fds, MAX_EVENT_SOURCES, worker->event_timeout);
12264 red_handle_streams_timout(worker);
12265 spice_timer_queue_cb();
12266
12267 if (worker->display_channel) {
12268 /* during migration, in the dest, the display channel can be initialized
12269 while the global lz data not since migrate data msg hasn't been
(gdb) print num_events
$1 = 0 <== no fd is ready, poll() returned for timeout
(gdb) print worker->poll_fds
$4 = {{fd = 20, events = 1, revents = 0}, {fd = 29, events = 5, revents = 0}, {fd = 31, events = 1, revents = 0}, {fd = -1, events = 0, revents = 0} <repeats 17 times>}
ll /proc/7409/fds (before I made the coredump:)
lrwx------ 1 root root 64 2013-12-12 18:08 20 -> socket:[6309964]
lrwx------ 1 root root 64 2013-12-12 18:08 29 -> socket:[6299877]
lrwx------ 1 root root 64 2013-12-12 18:08 31 -> socket:[6261651]
(gdb) print worker->event_timeout
$5 = 10 <== curiously number
(gdb) print worker->running
$7 = 1
=========== Finally , my question is: ==================
1. The big loop has no 'exit', even 'thread_exit' is not found in spice source.
How does red_worker thread quit in normal situation , or it just runs until qemu main thread quit?
2. In my understanding, spice communicate with spice-gtk via tcp,
but here we dont see tcp fds, but 3 unix sockets, what does those unix sockets for?
3. Despite of that 'poll() timeout' is non-zero, the big loop still consumes 100% cpu,
does this mean there is something wrong in timer func -- red_channel_client_ping_timer() ?
Thanks for reading,
more thanks for your hint and idea:)
Attachment:
qemu_hung_on_vm_shutdown
Description: Binary data
_______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/spice-devel