At 2017-01-11 02:23:54, "John Ferlan" <jferlan@xxxxxxxxxx> wrote: > > >On 12/30/2016 03:39 AM, Chen Hanxiao wrote: >> From: Chen Hanxiao <chenhanxiao@xxxxxxxxx> >> >> This patch fix a dead lock when try to read a rbd image >> >> When trying to connect a rbd server >> (ceph-0.94.7-1.el7.centos.x86_64), >> >> rbd_list/rbd_open enter a dead lock state. >> >> Backtrace: >> Thread 30 (Thread 0x7fdb342d0700 (LWP 12105)): >> #0 0x00007fdb40b16705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 >> #1 0x00007fdb294273f1 in librados::IoCtxImpl::operate_read(object_t const&, ObjectOperation*, ceph::buffer::list*, int) () from /lib64/librados.so.2 >> #2 0x00007fdb29429fcc in librados::IoCtxImpl::read(object_t const&, ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2 >> #3 0x00007fdb293e850c in librados::IoCtx::read(std::string const&, ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2 >> #4 0x00007fdb2b9dd15e in librbd::list(librados::IoCtx&, std::vector<std::string, std::allocator<std::string> >&) () from /lib64/librbd.so.1 >> #5 0x00007fdb2b98c089 in rbd_list () from /lib64/librbd.so.1 >> #6 0x00007fdb2e1a8052 in virStorageBackendRBDRefreshPool (conn=<optimized out>, pool=0x7fdafc002d50) at storage/storage_backend_rbd.c:366 >> #7 0x00007fdb2e193833 in storagePoolCreate (obj=0x7fdb1c1fd5a0, flags=<optimized out>) at storage/storage_driver.c:876 >> #8 0x00007fdb43790ea1 in virStoragePoolCreate (pool=pool@entry=0x7fdb1c1fd5a0, flags=0) at libvirt-storage.c:695 >> #9 0x00007fdb443becdf in remoteDispatchStoragePoolCreate (server=0x7fdb45fb2ab0, msg=0x7fdb45fb3db0, args=0x7fdb1c0037d0, rerr=0x7fdb342cfc30, client=<optimized out>) at remote_dispatch.h:14383 >> #10 remoteDispatchStoragePoolCreateHelper (server=0x7fdb45fb2ab0, client=<optimized out>, msg=0x7fdb45fb3db0, rerr=0x7fdb342cfc30, args=0x7fdb1c0037d0, ret=0x7fdb1c1b3260) at remote_dispatch.h:14359 >> #11 0x00007fdb437d9c42 in virNetServerProgramDispatchCall (msg=0x7fdb45fb3db0, client=0x7fdb45fd1a80, server=0x7fdb45fb2ab0, prog=0x7fdb45fcd670) at rpc/virnetserverprogram.c:437 >> #12 virNetServerProgramDispatch (prog=0x7fdb45fcd670, server=server@entry=0x7fdb45fb2ab0, client=0x7fdb45fd1a80, msg=0x7fdb45fb3db0) at rpc/virnetserverprogram.c:307 >> #13 0x00007fdb437d4ebd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7fdb45fb2ab0) at rpc/virnetserver.c:135 >> #14 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fdb45fb2ab0) at rpc/virnetserver.c:156 >> #15 0x00007fdb436cfb35 in virThreadPoolWorker (opaque=opaque@entry=0x7fdb45fa7650) at util/virthreadpool.c:145 >> #16 0x00007fdb436cf058 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 >> #17 0x00007fdb40b12df5 in start_thread () from /lib64/libpthread.so.0 >> #18 0x00007fdb408401ad in clone () from /lib64/libc.so.6 >> >> 366 len = rbd_list(ptr.ioctx, names, &max_size); >> (gdb) n >> [New Thread 0x7fdb20758700 (LWP 22458)] >> [New Thread 0x7fdb20556700 (LWP 22459)] >> [Thread 0x7fdb20758700 (LWP 22458) exited] >> [New Thread 0x7fdb20455700 (LWP 22460)] >> [Thread 0x7fdb20556700 (LWP 22459) exited] >> [New Thread 0x7fdb20556700 (LWP 22461)] >> >> infinite loop... >> >> Signed-off-by: Chen Hanxiao <chenhanxiao@xxxxxxxxx> >> --- >> src/storage/storage_backend_rbd.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> > >Could you provide a bit more context... > >Why does calling rados_conf_read_file with a NULL resolve the issue? > >Is this something "new" or "expected"? And if expected, why are we only >seeing it now? > >What is the other thread that "has" the lock doing? It seams that the server side of ceph does not response our request. So when libvirt call rbd_open/rbd_list, etc, it never return. But qemu works fine. So I take qemu's code as a reference. https://github.com/qemu/qemu/blob/master/block/rbd.c#L365 rados_conf_read_file with a NULL will try to get ceph conf file from /etc/ceph and other default paths. Althougth we rados_conf_set in the following code, w/o rados_conf_read_file, ceph-0.94.7-1.el7 does not answer our rbd_open. Some elder or newer ceph server does not have this issue. I think this may be a ceph server bug of ceph-0.94.7-1.el7. Doing rados_conf_read_file(cluster, NULL) will make our code more robust. Regards, - Chen > >>From my cursory/quick read of : > >http://docs.ceph.com/docs/master/rados/api/librados/ > >... >"Then you configure your rados_t to connect to your cluster, either by >setting individual values (rados_conf_set()), using a configuration file >(rados_conf_read_file()), using command line options >(rados_conf_parse_argv()), or an environment variable >(rados_conf_parse_env()):" > >Since we use rados_conf_set, that would seem to indicate we're OK. It's >not clear from just what's posted why calling eventually calling >rbd_list is causing a hang. > >I don't have the cycles or environment to do the research right now and >it really isn't clear why a read_file would resolve the issue. > >John >> diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c >> index b1c51ab..233737b 100644 >> --- a/src/storage/storage_backend_rbd.c >> +++ b/src/storage/storage_backend_rbd.c >> @@ -95,6 +95,9 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, >> goto cleanup; >> } >> >> + /* try default location, but ignore failure */ >> + rados_conf_read_file(ptr->cluster, NULL); >> + >> if (!conn) { >> virReportError(VIR_ERR_INTERNAL_ERROR, "%s", >> _("'ceph' authentication not supported " >> @@ -124,6 +127,10 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr, >> _("failed to create the RADOS cluster")); >> goto cleanup; >> } >> + >> + /* try default location, but ignore failure */ >> + rados_conf_read_file(ptr->cluster, NULL); >> + >> if (virStorageBackendRBDRADOSConfSet(ptr->cluster, >> "auth_supported", "none") < 0) >> goto cleanup; >> > >-- >libvir-list mailing list >libvir-list@xxxxxxxxxx >https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list