On 12/30/2016 03:39 AM, Chen Hanxiao wrote: > From: Chen Hanxiao <chenhanxiao@xxxxxxxxx> > > This patch fix a dead lock when try to read a rbd image > > When trying to connect a rbd server > (ceph-0.94.7-1.el7.centos.x86_64), > > rbd_list/rbd_open enter a dead lock state. > > Backtrace: > Thread 30 (Thread 0x7fdb342d0700 (LWP 12105)): > #0 0x00007fdb40b16705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x00007fdb294273f1 in librados::IoCtxImpl::operate_read(object_t const&, ObjectOperation*, ceph::buffer::list*, int) () from /lib64/librados.so.2 > #2 0x00007fdb29429fcc in librados::IoCtxImpl::read(object_t const&, ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2 > #3 0x00007fdb293e850c in librados::IoCtx::read(std::string const&, ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2 > #4 0x00007fdb2b9dd15e in librbd::list(librados::IoCtx&, std::vector<std::string, std::allocator<std::string> >&) () from /lib64/librbd.so.1 > #5 0x00007fdb2b98c089 in rbd_list () from /lib64/librbd.so.1 > #6 0x00007fdb2e1a8052 in virStorageBackendRBDRefreshPool (conn=<optimized out>, pool=0x7fdafc002d50) at storage/storage_backend_rbd.c:366 > #7 0x00007fdb2e193833 in storagePoolCreate (obj=0x7fdb1c1fd5a0, flags=<optimized out>) at storage/storage_driver.c:876 > #8 0x00007fdb43790ea1 in virStoragePoolCreate (pool=pool@entry=0x7fdb1c1fd5a0, flags=0) at libvirt-storage.c:695 > #9 0x00007fdb443becdf in remoteDispatchStoragePoolCreate (server=0x7fdb45fb2ab0, msg=0x7fdb45fb3db0, args=0x7fdb1c0037d0, rerr=0x7fdb342cfc30, client=<optimized out>) at remote_dispatch.h:14383 > #10 remoteDispatchStoragePoolCreateHelper (server=0x7fdb45fb2ab0, client=<optimized out>, msg=0x7fdb45fb3db0, rerr=0x7fdb342cfc30, args=0x7fdb1c0037d0, ret=0x7fdb1c1b3260) at remote_dispatch.h:14359 > #11 0x00007fdb437d9c42 in virNetServerProgramDispatchCall (msg=0x7fdb45fb3db0, client=0x7fdb45fd1a80, server=0x7fdb45fb2ab0, prog=0x7fdb45fcd670) at rpc/virnetserverprogram.c:437 > #12 virNetServerProgramDispatch (prog=0x7fdb45fcd670, server=server@entry=0x7fdb45fb2ab0, client=0x7fdb45fd1a80, msg=0x7fdb45fb3db0) at rpc/virnetserverprogram.c:307 > #13 0x00007fdb437d4ebd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7fdb45fb2ab0) at rpc/virnetserver.c:135 > #14 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fdb45fb2ab0) at rpc/virnetserver.c:156 > #15 0x00007fdb436cfb35 in virThreadPoolWorker (opaque=opaque@entry=0x7fdb45fa7650) at util/virthreadpool.c:145 > #16 0x00007fdb436cf058 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 > #17 0x00007fdb40b12df5 in start_thread () from /lib64/libpthread.so.0 > #18 0x00007fdb408401ad in clone () from /lib64/libc.so.6 > > 366 len = rbd_list(ptr.ioctx, names, &max_size); > (gdb) n > [New Thread 0x7fdb20758700 (LWP 22458)] > [New Thread 0x7fdb20556700 (LWP 22459)] > [Thread 0x7fdb20758700 (LWP 22458) exited] > [New Thread 0x7fdb20455700 (LWP 22460)] > [Thread 0x7fdb20556700 (LWP 22459) exited] > [New Thread 0x7fdb20556700 (LWP 22461)] > > infinite loop... > > Signed-off-by: Chen Hanxiao <chenhanxiao@xxxxxxxxx> > --- > src/storage/storage_backend_rbd.c | 7 +++++++ > 1 file changed, 7 insertions(+) > Could you provide a bit more context... Why does calling rados_conf_read_file with a NULL resolve the issue? Is this something "new" or "expected"? And if expected, why are we only seeing it now? What is the other thread that "has" the lock doing? >From my cursory/quick read of : http://docs.ceph.com/docs/master/rados/api/librados/ ... "Then you configure your rados_t to connect to your cluster, either by setting individual values (rados_conf_set()), using a configuration file (rados_conf_read_file()), using command line options (rados_conf_parse_argv()), or an environment variable (rados_conf_parse_env()):" Since we use rados_conf_set, that would seem to indicate we're OK. It's not clear from just what's posted why calling eventually calling rbd_list is causing a hang. I don't have the cycles or environment to do the research right now and it really isn't clear why a read_file would resolve the issue. John > diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c > index b1c51ab..233737b 100644 > --- a/src/storage/storage_backend_rbd.c > +++ b/src/storage/storage_backend_rbd.c > @@ -95,6 +95,9 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, > goto cleanup; > } > > + /* try default location, but ignore failure */ > + rados_conf_read_file(ptr->cluster, NULL); > + > if (!conn) { > virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > _("'ceph' authentication not supported " > @@ -124,6 +127,10 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, > _("failed to create the RADOS cluster")); > goto cleanup; > } > + > + /* try default location, but ignore failure */ > + rados_conf_read_file(ptr->cluster, NULL); > + > if (virStorageBackendRBDRADOSConfSet(ptr->cluster, > "auth_supported", "none") < 0) > goto cleanup; > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list