RE: OSD sometimes stuck in init phase

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for quick response Haomai! Please find the backtrace here [1].

[1] - http://paste.openstack.org/show/411139/

Regards,
Unmesh G.
IRC: unmeshg

> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@xxxxxxxxx]
> Sent: Thursday, August 06, 2015 5:31 PM
> To: Gurjar, Unmesh
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: OSD sometimes stuck in init phase
> 
> Could you print your all thread callback via "thread apply all bt"?
> 
> On Thu, Aug 6, 2015 at 7:52 PM, Gurjar, Unmesh <unmesh.gurjar@xxxxxx>
> wrote:
> > Hi,
> >
> > On a Ceph Firefly cluster (version [1]), OSDs are configured to use separate
> data and journal disks (using the ceph-disk utility). It is observed, that few OSDs
> start-up fine (are 'up' and 'in' state); however, others are stuck in the 'init
> creating/touching snapmapper object' phase. Below is a OSD start-up log
> snippet:
> >
> > 2015-08-06 08:58:02.491537 7fd312df97c0  1 journal _open
> > /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size
> > 4096 bytes, directio = 1, aio = 1
> > 2015-08-06 08:58:02.498447 7fd312df97c0  1 journal _open
> > /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size
> > 4096 bytes, directio = 1, aio = 1
> > 2015-08-06 08:58:02.498720 7fd312df97c0  2 osd.0 0 boot
> > 2015-08-06 08:58:02.498865 7fd312df97c0 10 osd.0 0 read_superblock
> > sb(2645bbf6-16d0-4c42-8835-8ba9f5c95a1d osd.0
> > a821146f-0742-4724-b4ca-39ea4ccc298d e0 [0,0] lci=[0,0])
> > 2015-08-06 08:58:02.498937 7fd312df97c0 10 osd.0 0 init
> > creating/touching snapmapper object
> >
> > The log statement is inaccurate though, since it is actually doing init
> operation for the 'infos' object (as can be observed from source [2]).
> >
> > Upon debugging further, the thread seems to be waiting to acquire the
> 'ObjectStore::apply_transaction::my_lock' mutex. Below is the debug trace:
> >
> > (gdb) where
> > #0  0x00007fd3122b708f in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib/x86_64-linux-gnu/libpthread.so.0
> > #1  0x00007fd313132bf4 in
> > ObjectStore::apply_transactions(ObjectStore::Sequencer*,
> > std::list<ObjectStore::Transaction*,
> > std::allocator<ObjectStore::Transaction*> >&, Context*) ()
> > #2  0x00007fd313097d08 in
> > ObjectStore::apply_transaction(ObjectStore::Transaction&, Context*) ()
> > #3  0x00007fd313076790 in OSD::init() ()
> > #4  0x00007fd3130233a7 in main ()
> >
> > In a few cases, upon restarting the stuck OSD (service), it successfully
> completes the 'init' phase and reaches the 'up' and 'in' state!
> >
> > Any help is greatly appreciated. Please let me know if any more details are
> required for root causing.
> >
> > [1] - 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> > [2] -  https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L1211
> >
> > Regards,
> > Unmesh G.
> > IRC: unmeshg
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> Best Regards,
> 
> Wheat
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux