The hammer release is nearly end-of-life pending the release of luminous. I wouldn't say it's a bug so much as a consequence of timing out RADOS operations -- as I stated before, you most likely have another thread stuck waiting on the cluster while that lock is held, but you only provided the backtrace for a single thread. On Tue, Aug 8, 2017 at 2:34 AM, Shilu <shi.lu@xxxxxxx> wrote: > rbd_data.259fe1073f804.0000000000000929 925696~4096 should_complete: r = -110 this is timeout log, I put few logfile. > > I stop ceph by ceph osd pause, then ceph osd unpause , i use librbd by tgt, it will cause tgt thread hang, finally tgt can not write data to ceph > > > I test this on ceph 10.2.5,It work well, I think librbd has a bug on ceph 0.94.5 > > My Ceph.conf set rados_mon_op_timeout =75 > rados_osd_op_timeout = 75 > client_mount_timeout = 75 > > -----邮件原件----- > 发件人: Jason Dillaman [mailto:jdillama@xxxxxxxxxx] > 发送时间: 2017年8月8日 7:58 > 收件人: shilu 09816 (RD) > 抄送: ceph-users > 主题: Re: hammer(0.94.5) librbd dead lock,i want to how to resolve > > I am not sure what you mean by "I stop ceph" (stopped all the OSDs?) > -- and I am not sure how you are seeing ETIMEDOUT errors on a "rbd_write" call since it should just block assuming you are referring to stopping the OSDs. What is your use-case? Are you developing your own application on top of librbd? > > Regardless, I can only assume there is another thread that is blocked while it owns the librbd::ImageCtx::owner_lock. > > On Mon, Aug 7, 2017 at 8:35 AM, Shilu <shi.lu@xxxxxxx> wrote: >> I write data by rbd_write,when I stop ceph, rbd_write timeout and >> return >> -110 >> >> >> >> Then I call rbd_write again, it will deadlock, the code stack is >> showed below >> >> >> >> >> >> >> >> #0 pthread_rwlock_rdlock () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S:87 >> >> #1 0x00007fafbf9f75a0 in RWLock::get_read (this=0x7fafc48e1198) at >> ./common/RWLock.h:76 >> >> #2 0x00007fafbfa31de0 in RLocker (lock=..., this=<synthetic pointer>) >> at >> ./common/RWLock.h:130 >> >> #3 librbd::aio_write (ictx=0x7fafc48e1000, off=71516229632, len=4096, >> >> buf=0x7fafc499e000 "\235?[\257\367n\255\263?\200\034\061\341\r", >> c=0x7fafab44ef80, op_flags=0) at librbd/internal.cc:3320 >> >> #4 0x00007fafbf9eff19 in Context::complete (this=0x7fafab4174c0, >> r=<optimized out>) at ./include/Context.h:65 >> >> #5 0x00007fafbfb00016 in ThreadPool::worker (this=0x7fafc4852c40, >> wt=0x7fafc4948550) at common/WorkQueue.cc:128 >> >> #6 0x00007fafbfb010b0 in ThreadPool::WorkThread::entry >> (this=<optimized >> out>) at common/WorkQueue.h:408 >> >> #7 0x00007fafc59b6184 in start_thread (arg=0x7fafadbed700) at >> pthread_create.c:312 >> >> #8 0x00007fafc52aaffd in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 >> >> ---------------------------------------------------------------------- >> --------------------------------------------------------------- >> 本邮件及其附件含有新华三技术有限公司的保密信息,仅限于发送给上面地址中列出 >> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 >> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 >> 邮件! >> This e-mail and its attachments contain confidential information from >> New H3C, which is intended only for the person or entity whose address >> is listed above. Any use of the information contained herein in any >> way (including, but not limited to, total or partial disclosure, >> reproduction, or dissemination) by persons other than the intended >> recipient(s) is prohibited. If you receive this e-mail in error, >> please notify the sender by phone or email immediately and delete it! > > > > -- > Jason -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com