OSD sometimes stuck in init phase

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On a Ceph Firefly cluster (version [1]), OSDs are configured to use separate data and journal disks (using the ceph-disk utility). It is observed, that few OSDs start-up fine (are 'up' and 'in' state); however, others are stuck in the 'init creating/touching snapmapper object' phase. Below is a OSD start-up log snippet:

2015-08-06 08:58:02.491537 7fd312df97c0  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-08-06 08:58:02.498447 7fd312df97c0  1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 21: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 1
2015-08-06 08:58:02.498720 7fd312df97c0  2 osd.0 0 boot
2015-08-06 08:58:02.498865 7fd312df97c0 10 osd.0 0 read_superblock sb(2645bbf6-16d0-4c42-8835-8ba9f5c95a1d osd.0 a821146f-0742-4724-b4ca-39ea4ccc298d e0 [0,0] lci=[0,0])
2015-08-06 08:58:02.498937 7fd312df97c0 10 osd.0 0 init creating/touching snapmapper object

The log statement is inaccurate though, since it is actually doing init operation for the 'infos' object (as can be observed from source [2]).

Upon debugging further, the thread seems to be waiting to acquire the 'ObjectStore::apply_transaction::my_lock' mutex. Below is the debug trace:

(gdb) where
#0  0x00007fd3122b708f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fd313132bf4 in ObjectStore::apply_transactions(ObjectStore::Sequencer*, std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, Context*) ()
#2  0x00007fd313097d08 in ObjectStore::apply_transaction(ObjectStore::Transaction&, Context*) ()
#3  0x00007fd313076790 in OSD::init() ()
#4  0x00007fd3130233a7 in main ()

In a few cases, upon restarting the stuck OSD (service), it successfully completes the 'init' phase and reaches the 'up' and 'in' state! 

Any help is greatly appreciated. Please let me know if any more details are required for root causing.

[1] - 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
[2] -  https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L1211

Regards,
Unmesh G.
IRC: unmeshg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux