On 03/31/2013 06:20 PM, Peter Krempa wrote: > This series fixes the crash resulting from a race condition in the connection > close callback. To observe the crash apply the first patch only. To verify that > the patchset fixes the crash please apply all but 2/6 and verify using virsh. > 2/6 fixes the crash in a redundant way in case the close callback is used. In > the case it isn't 2/6 itself can't fix the issue. > > For a better explanation of this problem please see the description in 6/6. > > Peter Krempa (4): > DO NOT APPLY UPSTREAM: Close callback race corruption crash > reproducer. > virsh: Move cmdConnect from virsh-host.c to virsh.c > virsh: Register and unregister the close callback also in cmdConnect > rpc: Fix connection close callback race condition and memory > corruption/crash > > Viktor Mihajlovski (2): > libvirt: Increase connection reference count for callbacks > virsh: Unregister the connection close notifier upon termination > > src/datatypes.c | 55 ++++++++++++++++++++---- > src/datatypes.h | 22 +++++++--- > src/libvirt.c | 30 ++++++++----- > src/remote/remote_driver.c | 62 ++++++++++++++++----------- > src/rpc/virnetclient.c | 9 +++- > tools/virsh-host.c | 67 ----------------------------- > tools/virsh.c | 102 ++++++++++++++++++++++++++++++++++++++++++--- > 7 files changed, 225 insertions(+), 122 deletions(-) > I fear we're yet not thru this. Today I had a segfault doing a migration using virsh migrate --verbose --live $guest qemu+ssh://$host/system. This is with Friday's git HEAD. The migration took very long (but succeeded except for the libvirt crash) so there still seems to be a race lingering in the object reference counting exposed by the --verbose option (getjobinfo?). (gdb) bt #0 qemuDomainGetJobInfo (dom=<optimized out>, info=0x3fffaaaaa70) at qemu/qemu_driver.c:10166 #1 0x000003fffd4bbe68 in virDomainGetJobInfo (domain=0x3ffe4002660, info=0x3fffaaaaa70) at libvirt.c:17440 #2 0x000002aace36b528 in remoteDispatchDomainGetJobInfo (server=<optimized out>, msg=<optimized out>, ret=0x3ffe40029d0, args=0x3ffe40026a0, rerr=0x3fffaaaac20, client=<optimized out>) at remote_dispatch.h:2069 #3 remoteDispatchDomainGetJobInfoHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>, rerr=0x3fffaaaac20, args=0x3ffe40026a0, ret=0x3ffe40029d0) at remote_dispatch.h:2045 #4 0x000003fffd500384 in virNetServerProgramDispatchCall (msg=0x2ab035dd800, client=0x2ab035df5d0, server=0x2ab035ca370, prog=0x2ab035cf210) at rpc/virnetserverprogram.c:439 #5 virNetServerProgramDispatch (prog=0x2ab035cf210, server=0x2ab035ca370, client=0x2ab035df5d0, msg=0x2ab035dd800) at rpc/virnetserverprogram.c:305 #6 0x000003fffd4fad3c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x2ab035ca370) at rpc/virnetserver.c:162 #7 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x2ab035ca370) at rpc/virnetserver.c:183 #8 0x000003fffd42a91c in virThreadPoolWorker (opaque=opaque@entry=0x2ab035a9e60) at util/virthreadpool.c:144 #9 0x000003fffd42a236 in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:161 #10 0x000003fffcdee412 in start_thread () from /lib64/libpthread.so.0 #11 0x000003fffcd30056 in thread_start () from /lib64/libc.so.6 (gdb) l 10161 if (!(vm = qemuDomObjFromDomain(dom))) 10162 goto cleanup; 10163 10164 priv = vm->privateData; 10165 10166 if (virDomainObjIsActive(vm)) { 10167 if (priv->job.asyncJob && !priv->job.dump_memory_only) { 10168 memcpy(info, &priv->job.info, sizeof(*info)); 10169 10170 /* Refresh elapsed time again just to ensure it (gdb) print *vm $1 = {parent = {parent = {magic = 3735928559, refs = 0, klass = 0xdeadbeef}, lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}}, pid = 0, state = {state = 0, reason = 0}, autostart = 0, persistent = 0, updated = 0, def = 0x0, newDef = 0x0, snapshots = 0x0, current_snapshot = 0x0, hasManagedSave = false, privateData = 0x0, privateDataFreeFunc = 0x0, taint = 0} I am currently blocked with other work but if anyone has a theory that I should verify let me know... -- Mit freundlichen Grüßen/Kind Regards Viktor Mihajlovski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Köderitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list