* Gerd Hoffmann (kraxel@xxxxxxxxxx) wrote: > Hi, > > > >> On a quick glance I'd blame the guest for sending corrupted commands. > > >> Strange though that it happens on migration only, so there could be > > >> a host issue too. Or a timing issue triggered by migration. > > >> > > >> Which migration phase? > > > > > > This is the point at which it switches over in postcopy. > > > > It looks like it's the vmstate (post) load phase of the qxl device on > > destination host. > > Dave, can you try "thread apply all bt" so we see the other threads too? > That should show whenever it happens in post_load Yes, I already have the full set of threads; you can see the qxl_post_load in thread 1. red_dispatcher_loadvm_commands: id 0, group 0, virt start 0, virt end ffffffffffffffff, generation 0, delta 0 id 1, group 1, virt start 7fbe83c00000, virt end 7fbe87bfe000, generation 0, delta 7fbe83c00000 id 2, group 1, virt start 7fbe7fa00000, virt end 7fbe83a00000, generation 0, delta 7fbe7fa00000 (./x86_64-softmmu/qemu-system-x86_64:22376): Spice-CRITICAL **: red_memslots.c:123:get_virt: slot_id 128 too big, addr=8000000000000000 Thread 12 (Thread 0x7fc0a0df2700 (LWP 22377)): #0 0x00007fc0aa42f1bd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fc0aa42ad02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007fc0aa42ac08 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000556465736839 in qemu_mutex_lock (mutex=mutex@entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:64 #4 0x00005564653e69d6 in qemu_mutex_lock_iothread () at /root/git/qemu/cpus.c:1296 #5 0x000055646574596e in call_rcu_thread (opaque=<optimized out>) at /root/git/qemu/util/rcu.c:257 #6 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7fc09f304700 (LWP 22379)): #0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556465736999 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:137 #2 0x00005564653e6fe3 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /root/git/qemu/cpus.c:964 #3 qemu_kvm_cpu_thread_fn (arg=0x556466688740) at /root/git/qemu/cpus.c:1003 #4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7fc09eb03700 (LWP 22380)): #0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556465736999 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:137 #2 0x00005564653e6fe3 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /root/git/qemu/cpus.c:964 #3 qemu_kvm_cpu_thread_fn (arg=0x5564666ea960) at /root/git/qemu/cpus.c:1003 #4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7fc09e302700 (LWP 22381)): #0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556465736999 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:137 #2 0x00005564653e6fe3 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /root/git/qemu/cpus.c:964 #3 qemu_kvm_cpu_thread_fn (arg=0x55646670a120) at /root/git/qemu/cpus.c:1003 #4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7fc09db01700 (LWP 22382)): #0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556465736999 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x556465d76120 <qemu_global_mutex>) at /root/git/qemu/util/qemu-thread-posix.c:137 #2 0x00005564653e6fe3 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /root/git/qemu/cpus.c:964 #3 qemu_kvm_cpu_thread_fn (arg=0x5564667298d0) at /root/git/qemu/cpus.c:1003 #4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7fbe7f9ff700 (LWP 22383)): #0 0x00007fc0aa42f49d in read () from /lib64/libpthread.so.0 #1 0x00007fc0a8c36c01 in spice_backtrace_gstack () from /lib64/libspice-server.so.1 #2 0x00007fc0a8c3e4f7 in spice_logv () from /lib64/libspice-server.so.1 #3 0x00007fc0a8c3e655 in spice_log () from /lib64/libspice-server.so.1 #4 0x00007fc0a8bfc6de in get_virt () from /lib64/libspice-server.so.1 #5 0x00007fc0a8bfcb73 in red_get_data_chunks_ptr () from /lib64/libspice-server.so.1 #6 0x00007fc0a8bff3fa in red_get_cursor_cmd () from /lib64/libspice-server.so.1 #7 0x00007fc0a8c0fd79 in handle_dev_loadvm_commands () from /lib64/libspice-server.so.1 #8 0x00007fc0a8bf9523 in dispatcher_handle_recv_read () from /lib64/libspice-server.so.1 #9 0x00007fc0a8c1d5a5 in red_worker_main () from /lib64/libspice-server.so.1 #10 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #11 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7fbe7efff700 (LWP 22385)): #0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556465736999 in qemu_cond_wait (cond=cond@entry=0x556466ecde00, mutex=mutex@entry=0x556466ecde30) at /root/git/qemu/util/qemu-thread-posix.c:137 #2 0x0000556465680c5b in vnc_worker_thread_loop (queue=queue@entry=0x556466ecde00) at /root/git/qemu/ui/vnc-jobs.c:228 #3 0x0000556465681198 in vnc_worker_thread (arg=0x556466ecde00) at /root/git/qemu/ui/vnc-jobs.c:335 #4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7fc0a05f1700 (LWP 22958)): #0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556465736999 in qemu_cond_wait (cond=cond@entry=0x556467b89730, mutex=mutex@entry=0x556467b89708) at /root/git/qemu/util/qemu-thread-posix.c:137 #2 0x000055646540a643 in do_data_decompress (opaque=0x556467b89700) at /root/git/qemu/migration/ram.c:2284 #3 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7fbe7dfff700 (LWP 22959)): #0 0x00007fc0aa42c6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000556465736999 in qemu_cond_wait (cond=cond@entry=0x556467b897a8, mutex=mutex@entry=0x556467b89780) at /root/git/qemu/util/qemu-thread-posix.c:137 #2 0x000055646540a643 in do_data_decompress (opaque=0x556467b89778) at /root/git/qemu/migration/ram.c:2284 #3 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7fbe7d7fe700 (LWP 22967)): #0 0x00007fc0a616ddad in poll () from /lib64/libc.so.6 #1 0x0000556465631868 in poll (__timeout=-1, __nfds=2, __fds=0x7fbe7d7fd990) at /usr/include/bits/poll2.h:46 #2 postcopy_ram_fault_thread (opaque=0x556466c93f10) at /root/git/qemu/migration/postcopy-ram.c:405 #3 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #4 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7fbe7cffd700 (LWP 22968)): #0 0x00007fc0a6172ba9 in syscall () from /lib64/libc.so.6 #1 0x0000556465736ca6 in futex_wait (val=<optimized out>, ev=<optimized out>) at /root/git/qemu/util/qemu-thread-posix.c:306 #2 qemu_event_wait (ev=ev@entry=0x556466c93f18) at /root/git/qemu/util/qemu-thread-posix.c:422 #3 0x000055646541044d in postcopy_ram_listen_thread (opaque=0x556467e45740) at /root/git/qemu/migration/savevm.c:1485 #4 0x00007fc0aa428dc5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fc0a61786ed in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7fc0aead5c40 (LWP 22376)): #0 0x00007fc0aa42f49d in read () from /lib64/libpthread.so.0 #1 0x00007fc0a8bf9264 in read_safe () from /lib64/libspice-server.so.1 #2 0x00007fc0a8bf9717 in dispatcher_send_message () from /lib64/libspice-server.so.1 #3 0x00007fc0a8bfa0c2 in red_dispatcher_loadvm_commands () from /lib64/libspice-server.so.1 #4 0x000055646556c03d in qxl_spice_loadvm_commands (qxl=qxl@entry=0x55646755b8c0, ext=ext@entry=0x556467a895a0, count=2) at /root/git/qemu/hw/display/qxl.c:219 #5 0x000055646556d15f in qxl_post_load (opaque=0x55646755b8c0, version=<optimized out>) at /root/git/qemu/hw/display/qxl.c:2212 #6 0x000055646562f1b8 in vmstate_load_state (f=f@entry=0x5564666347d0, vmsd=<optimized out>, opaque=0x55646755b8c0, version_id=version_id@entry=21) at /root/git/qemu/migration/vmstate.c:151 #7 0x000055646540f4a1 in vmstate_load (f=0x5564666347d0, se=0x5564676f90a0, version_id=21) at /root/git/qemu/migration/savevm.c:690 #8 0x000055646540f6db in qemu_loadvm_section_start_full (f=f@entry=0x5564666347d0, mis=mis@entry=0x556466c93f10) at /root/git/qemu/migration/savevm.c:1843 #9 0x000055646540f9ac in qemu_loadvm_state_main (f=f@entry=0x5564666347d0, mis=mis@entry=0x556466c93f10) at /root/git/qemu/migration/savevm.c:1900 #10 0x000055646540fd8f in loadvm_handle_cmd_packaged (mis=0x556466c93f10) at /root/git/qemu/migration/savevm.c:1660 #11 loadvm_process_command (f=0x556467e45740) at /root/git/qemu/migration/savevm.c:1723 #12 qemu_loadvm_state_main (f=f@entry=0x556467e45740, mis=mis@entry=0x556466c93f10) at /root/git/qemu/migration/savevm.c:1913 #13 0x0000556465412546 in qemu_loadvm_state (f=f@entry=0x556467e45740) at /root/git/qemu/migration/savevm.c:1973 #14 0x000055646562b4e8 in process_incoming_migration_co (opaque=0x556467e45740) at /root/git/qemu/migration/migration.c:394 #15 0x0000556465746ada in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /root/git/qemu/util/coroutine-ucontext.c:79 #16 0x00007fc0a60c7cf0 in ?? () from /lib64/libc.so.6 #17 0x00007ffe14885180 in ?? () #18 0x0000000000000000 in ?? () > > Maybe if you trace qxl device save/load related functions > > on both src and dst hosts you'll see a difference. > > qxl keeps references to certain commands (create surface for example) in > qxl device memory, so it can replay them in post_load. That possibly > doesn't work correctly with postcopy. It should; the device memory is just a RAMBlock that's migrated, so if it's not arrived yet from the source the qxl code will block until postcopy drags it across; assuming that is that the qxl code on the source isn't still trying to write to it's copy at the same time, which at this point it shouldn't. Dave > cheers, > Gerd > -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK _______________________________________________ Spice-devel mailing list Spice-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/spice-devel