Thanks for quick response Haomai! Please find the backtrace here [1]. [1] - http://paste.openstack.org/show/411139/ Regards, Unmesh G. IRC: unmeshg > -----Original Message----- > From: Haomai Wang [mailto:haomaiwang@xxxxxxxxx] > Sent: Thursday, August 06, 2015 5:31 PM > To: Gurjar, Unmesh > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: OSD sometimes stuck in init phase > > Could you print your all thread callback via "thread apply all bt"? > > On Thu, Aug 6, 2015 at 7:52 PM, Gurjar, Unmesh <unmesh.gurjar@xxxxxx> > wrote: > > Hi, > > > > On a Ceph Firefly cluster (version [1]), OSDs are configured to use separate > data and journal disks (using the ceph-disk utility). It is observed, that few OSDs > start-up fine (are 'up' and 'in' state); however, others are stuck in the 'init > creating/touching snapmapper object' phase. Below is a OSD start-up log > snippet: > > > > 2015-08-06 08:58:02.491537 7fd312df97c0 1 journal _open > > /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size > > 4096 bytes, directio = 1, aio = 1 > > 2015-08-06 08:58:02.498447 7fd312df97c0 1 journal _open > > /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size > > 4096 bytes, directio = 1, aio = 1 > > 2015-08-06 08:58:02.498720 7fd312df97c0 2 osd.0 0 boot > > 2015-08-06 08:58:02.498865 7fd312df97c0 10 osd.0 0 read_superblock > > sb(2645bbf6-16d0-4c42-8835-8ba9f5c95a1d osd.0 > > a821146f-0742-4724-b4ca-39ea4ccc298d e0 [0,0] lci=[0,0]) > > 2015-08-06 08:58:02.498937 7fd312df97c0 10 osd.0 0 init > > creating/touching snapmapper object > > > > The log statement is inaccurate though, since it is actually doing init > operation for the 'infos' object (as can be observed from source [2]). > > > > Upon debugging further, the thread seems to be waiting to acquire the > 'ObjectStore::apply_transaction::my_lock' mutex. Below is the debug trace: > > > > (gdb) where > > #0 0x00007fd3122b708f in pthread_cond_wait@@GLIBC_2.3.2 () from > > /lib/x86_64-linux-gnu/libpthread.so.0 > > #1 0x00007fd313132bf4 in > > ObjectStore::apply_transactions(ObjectStore::Sequencer*, > > std::list<ObjectStore::Transaction*, > > std::allocator<ObjectStore::Transaction*> >&, Context*) () > > #2 0x00007fd313097d08 in > > ObjectStore::apply_transaction(ObjectStore::Transaction&, Context*) () > > #3 0x00007fd313076790 in OSD::init() () > > #4 0x00007fd3130233a7 in main () > > > > In a few cases, upon restarting the stuck OSD (service), it successfully > completes the 'init' phase and reaches the 'up' and 'in' state! > > > > Any help is greatly appreciated. Please let me know if any more details are > required for root causing. > > > > [1] - 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) > > [2] - https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L1211 > > > > Regards, > > Unmesh G. > > IRC: unmeshg > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > > -- > Best Regards, > > Wheat ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f