Could you print your all thread callback via "thread apply all bt"? On Thu, Aug 6, 2015 at 7:52 PM, Gurjar, Unmesh <unmesh.gurjar@xxxxxx> wrote: > Hi, > > On a Ceph Firefly cluster (version [1]), OSDs are configured to use separate data and journal disks (using the ceph-disk utility). It is observed, that few OSDs start-up fine (are 'up' and 'in' state); however, others are stuck in the 'init creating/touching snapmapper object' phase. Below is a OSD start-up log snippet: > > 2015-08-06 08:58:02.491537 7fd312df97c0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 1 > 2015-08-06 08:58:02.498447 7fd312df97c0 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 1 > 2015-08-06 08:58:02.498720 7fd312df97c0 2 osd.0 0 boot > 2015-08-06 08:58:02.498865 7fd312df97c0 10 osd.0 0 read_superblock sb(2645bbf6-16d0-4c42-8835-8ba9f5c95a1d osd.0 a821146f-0742-4724-b4ca-39ea4ccc298d e0 [0,0] lci=[0,0]) > 2015-08-06 08:58:02.498937 7fd312df97c0 10 osd.0 0 init creating/touching snapmapper object > > The log statement is inaccurate though, since it is actually doing init operation for the 'infos' object (as can be observed from source [2]). > > Upon debugging further, the thread seems to be waiting to acquire the 'ObjectStore::apply_transaction::my_lock' mutex. Below is the debug trace: > > (gdb) where > #0 0x00007fd3122b708f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 > #1 0x00007fd313132bf4 in ObjectStore::apply_transactions(ObjectStore::Sequencer*, std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, Context*) () > #2 0x00007fd313097d08 in ObjectStore::apply_transaction(ObjectStore::Transaction&, Context*) () > #3 0x00007fd313076790 in OSD::init() () > #4 0x00007fd3130233a7 in main () > > In a few cases, upon restarting the stuck OSD (service), it successfully completes the 'init' phase and reaches the 'up' and 'in' state! > > Any help is greatly appreciated. Please let me know if any more details are required for root causing. > > [1] - 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) > [2] - https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L1211 > > Regards, > Unmesh G. > IRC: unmeshg > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html