On Fri, May 18, 2012 at 11:55:05AM +0800, Daniel Veillard wrote: > One of the ouput raised frequently is: > ---------------------------------------------------------------------- > ==5838== WARNING: Possible data race during write of size 4 at 0x34723B3BD0: {{{ > ==5838== T1 (L{L17, L125}): > ==5838== #0 ptmalloc_unlock_all2 /lib64/libc-2.14.90.so > ==5838== #1 fork /lib64/libc-2.14.90.so > ==5838== #2 virFork /usr/src/debug/libvirt-0.9.12/src/util/command.c:268 > ==5838== #3 virExecWithHook.constprop.5 /usr/src/debug/libvirt-0.9.12/src/util/command.c:489 > ==5838== #4 virCommandRunAsync /usr/src/debug/libvirt-0.9.12/src/util/command.c:2177 > ==5838== #5 virCommandRun /usr/src/debug/libvirt-0.9.12/src/util/command.c:1970 > ==5838== #6 qemuCapsExtractVersionInfo /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1379 > ==5838== #7 qemuCapsExtractVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1568 > ==5838== #8 qemudGetVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_driver.c:1320 > ==5838== #9 virConnectGetVersion /usr/src/debug/libvirt-0.9.12/src/libvirt.c:1640 > ==5838== #10 remoteDispatchGetVersionHelper /usr/src/debug/libvirt-0.9.12/daemon/remote_dispatch.h:6862 > ==5838== #11 virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416 > ==5838== Concurrent read(s) happened at (OR AFTER) these points: > ==5838== T11 (L{}): > ==5838== #0 arena_thread_freeres /lib64/libc-2.14.90.so > ==5838== #1 __libc_thread_freeres /lib64/libc-2.14.90.so > ==5838== #2 start_thread /lib64/libpthread-2.14.90.so > ==5838== Address 0x34723B3BD0 is 0 bytes inside data symbol "list_lock" > ==5838== Locks involved in this report (reporting last lock sites): {L17, L125} > ==5838== L17 (0x5339B00) > ==5838== #0 pthread_mutex_lock /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935 > ==5838== #1 virFork /usr/src/debug/libvirt-0.9.12/src/util/command.c:266 > ==5838== #2 virExecWithHook.constprop.5 /usr/src/debug/libvirt-0.9.12/src/util/command.c:489 > ==5838== #3 virCommandRunAsync /usr/src/debug/libvirt-0.9.12/src/util/command.c:2177 > ==5838== #4 virCommandRun /usr/src/debug/libvirt-0.9.12/src/util/command.c:1970 > ==5838== #5 qemuCapsExtractVersionInfo /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1379 > ==5838== #6 qemuCapsExtractVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1568 > ==5838== #7 qemudGetVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_driver.c:1320 > ==5838== #8 virConnectGetVersion /usr/src/debug/libvirt-0.9.12/src/libvirt.c:1640 > ==5838== #9 remoteDispatchGetVersionHelper /usr/src/debug/libvirt-0.9.12/daemon/remote_dispatch.h:6862 > ==5838== #10 virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416 > ==5838== #11 virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161 > ==5838== L125 (0x4C0A0A00) > ==5838== #0 pthread_mutex_lock /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935 > ==5838== #1 qemudGetVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_driver.c:1319 > ==5838== #2 virConnectGetVersion /usr/src/debug/libvirt-0.9.12/src/libvirt.c:1640 > ==5838== #3 remoteDispatchGetVersionHelper /usr/src/debug/libvirt-0.9.12/daemon/remote_dispatch.h:6862 > ==5838== #4 virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416 > ==5838== #5 virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161 > ==5838== #6 virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144 > ==5838== #7 virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161 > ==5838== Race verifier data: 0x347207CF98,0x347215D6BB > ==5838== }}} > > I don't know what "Race verifier data" means but it seems that we do > *pid = fork(); in the main thread and that another thread is accessing > without synchronization. The probram properly report that 2 locks are > held at that point, the virLogLock() and qemuDriverLock() and give full > stack trace on where those locks were taken. That's convenient but > I could not really spot where the second thread did the access, I suspect > that's somehow related to libpthread considering the minimal stack trace > of the second thread T11 ... No, those first 2 stack traces are refering to the locks inside GLibC's malloc() implementation, so not something in our code. THis is bogus I believe. > I however found out a couple of issues not related to libpthread: > > ==5733== WARNING: Possible data race during write of size 1 at 0x34723B6B14: {{{ > ==5733== T5 (L{}): > ==5733== #0 Replace_memcpy /mnt/data/build/slave/full_linux_build/build/tsan/ts_replace.h:113 > ==5733== #1 __GI_memcpy /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:2359 > ==5733== #2 xdrmem_getbytes /lib64/libc-2.14.90.so > ==5733== #3 xdr_string /lib64/libc-2.14.90.so > ==5733== #4 xdr_remote_nonnull_string /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:31 > ==5733== #5 xdr_remote_nonnull_domain /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:58 > ==5733== #6 xdr_remote_domain_get_info_args /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:1390 > ==5733== #7 virNetMessageDecodePayload /usr/src/debug/libvirt-0.9.12/src/rpc/virnetmessage.c:382 > ==5733== #8 virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:403 > ==5733== #9 virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161 > ==5733== #10 virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144 > ==5733== #11 virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161 > ==5733== Concurrent write(s) happened at (OR AFTER) these points: > ==5733== T2 (L{}): > ==5733== #0 xdrmem_getbytes /lib64/libc-2.14.90.so > ==5733== #1 xdr_string /lib64/libc-2.14.90.so > ==5733== #2 xdr_remote_nonnull_string /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:31 > ==5733== #3 xdr_remote_nonnull_domain /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:58 > ==5733== #4 xdr_remote_domain_get_xml_desc_args /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:1532 > ==5733== #5 virNetMessageDecodePayload /usr/src/debug/libvirt-0.9.12/src/rpc/virnetmessage.c:382 > ==5733== #6 virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:403 > ==5733== #7 virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161 > ==5733== #8 virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144 > ==5733== #9 virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161 > ==5733== Address 0x34723B6B14 is 0 bytes inside data symbol "crud.6424" > ==5733== Race verifier data: 0x4A091CE,0x347212579D > ==5733== }}} > > that one could potentially be nasty, it seems that 2 threads decoding > rpc payload in parallel hit an unsynchronized write access in > the xdr routine for xdr_remote_nonnull_string() . I didn't had the > debuginfo for glibc installed when I ran it so I don't have the > exact informations at the top of the stacks This looks bogus to me too. These are two completely separate RPC messages being dispatched here (virDomainGetInfo and virDomainGetXMLDesc). If they truely pointed to the same data, we'd be going SEGV tastic all over the place. > > The other one raised is actually something I had initially found > with the first run on libxml2 so it's relatively consistent: Yay libxml2 is not thread safe ! "The function rand() is not reentrant or thread-safe, since it uses hidden state that is modified on each call. This might just be the seed value to be used by the next call, or it might be something more elaborate. In order to get reproducible behavior in a threaded application, this state must be made explicit; this can be done using the reentrant function rand_r()." You should probably validate libxml2 against all the functions have blacklisted for libvirt in Makefile.nonreentrant > > ==5733== WARNING: Possible data race during read of size 4 at 0x34723B10B4: {{{ > ==5733== T1 (L{L173}): > ==5733== #0 random_r /lib64/libc-2.14.90.so > ==5733== #1 random /lib64/libc-2.14.90.so > ==5733== #2 rand /lib64/libc-2.14.90.so > ==5733== #3 xmlDictCreate /usr/src/debug/libxml2-2.8.0/dict.c:491 > ==5733== #4 xmlInitParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1620 > ==5733== #5 xmlNewParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1874 > ==5733== #6 xmlCreateURLParserCtxt /usr/src/debug/libxml2-2.8.0/parser.c:13546 > ==5733== #7 xmlReadFile /usr/src/debug/libxml2-2.8.0/parser.c:14713 > ==5733== #8 xmlRelaxNGParse /usr/src/debug/libxml2-2.8.0/relaxng.c:7480 > ==5733== #9 (no symbols) /usr/lib64/libnetcf.so.1.4.0 > ==5733== #10 ncf_init /usr/lib64/libnetcf.so.1.4.0 > ==5733== #11 interfaceOpenInterface /usr/src/debug/libvirt-0.9.12/src/interface/netcf_driver.c:144 > ==5733== Concurrent write(s) happened at (OR AFTER) these points: > ==5733== T10 (L{L156}): > ==5733== #0 random_r /lib64/libc-2.14.90.so > ==5733== #1 random /lib64/libc-2.14.90.so > ==5733== #2 rand /lib64/libc-2.14.90.so > ==5733== #3 xmlHashCreate /usr/src/debug/libxml2-2.8.0/hash.c:193 > ==5733== #4 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:4125 > ==5733== #5 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6380 > ==5733== #6 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523 > ==5733== #7 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6319 > ==5733== #8 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6520 > ==5733== #9 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523 > ==5733== Address 0x34723B10B4 is 20 bytes inside data symbol "randtbl" > ==5733== Locks involved in this report (reporting last lock sites): {L156, L173} > ==5733== L156 (0x402E790) > ==5733== #0 pthread_mutex_lock /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935 > ==5733== #1 remoteDispatchOpenHelper /usr/src/debug/libvirt-0.9.12/daemon/remote.c:683 > ==5733== #2 virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416 > ==5733== #3 virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161 > ==5733== #4 virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144 > ==5733== #5 virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161 > ==5733== L173 (0x402F420) > ==5733== #0 pthread_mutex_lock /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935 > ==5733== #1 remoteDispatchOpenHelper /usr/src/debug/libvirt-0.9.12/daemon/remote.c:683 > ==5733== #2 virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416 > ==5733== #3 virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161 > ==5733== #4 virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144 > ==5733== #5 virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161 > ==5733== Race verifier data: 0x347203A572,0x347203A565 > ==5733== }}} > > > I didn't know that netcf was doing Relax-NG validation on-the-fly :) > and hit another in libxml2 though libvirt/netcf/libxslt/libxml2 > call: > > ==5733== WARNING: Possible data race during read of size 4 at 0x34723B10B4: {{{ > ==5733== T1 (L{L173}): > ==5733== #0 random_r /lib64/libc-2.14.90.so > ==5733== #1 random /lib64/libc-2.14.90.so > ==5733== #2 rand /lib64/libc-2.14.90.so > ==5733== #3 xmlDictCreate /usr/src/debug/libxml2-2.8.0/dict.c:491 > ==5733== #4 xmlInitParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1620 > ==5733== #5 xmlNewParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1874 > ==5733== #6 xmlCreateURLParserCtxt /usr/src/debug/libxml2-2.8.0/parser.c:13546 > ==5733== #7 xmlReadFile /usr/src/debug/libxml2-2.8.0/parser.c:14713 > ==5733== #8 xmlRelaxNGParse /usr/src/debug/libxml2-2.8.0/relaxng.c:7480 > ==5733== #9 (no symbols) /usr/lib64/libnetcf.so.1.4.0 > ==5733== #10 ncf_init /usr/lib64/libnetcf.so.1.4.0 > ==5733== #11 interfaceOpenInterface /usr/src/debug/libvirt-0.9.12/src/interface/netcf_driver.c:144 > ==5733== Concurrent write(s) happened at (OR AFTER) these points: > ==5733== T10 (L{L156}): > ==5733== #0 random_r /lib64/libc-2.14.90.so > ==5733== #1 random /lib64/libc-2.14.90.so > ==5733== #2 rand /lib64/libc-2.14.90.so > ==5733== #3 xmlHashCreate /usr/src/debug/libxml2-2.8.0/hash.c:193 > ==5733== #4 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:4125 > ==5733== #5 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6380 > ==5733== #6 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523 > ==5733== #7 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6319 > ==5733== #8 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6520 > ==5733== #9 xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523 > ==5733== Address 0x34723B10B4 is 20 bytes inside data symbol "randtbl" > ==5733== Locks involved in this report (reporting last lock sites): {L156, L173} > > Looks like another libxml2 bug calling rand, I should use rand_r() instead > if available. Yep. > So except for the parallel xdr_remote_nonnull_string() issue > I don't think there is anything I can report at this point for libvirt > code itself. I think that is bogus. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list