For urgent help: OSD down under heavier workload

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Ceph folks,

Can someoen with better ceph expertise help out?

 I am running a small ceph cluster of 3 nodes, Luminoous 12.2.12. On one node  with weaker CPU, i have to reweight OSDs to keep them alive. If PGs on these OSDs exceeds some 40+, then these OSDs will continusouly restart (down) without success. I checked the CPU usage and it is actually not overloaded.

What could be the root problem?


best regards,

Samuel

***********************************************************************************************************************************************
lags = delete
   -42> 2020-03-17 17:13:53.374486 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbf69ccd:::rbd_data.128dd6b8b4567.0000000000003e96:head have 217'1230 flags = delete tried to add 217'1230 flags = delete
   -41> 2020-03-17 17:13:53.374502 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbf72b1e:::rbd_data.128f46b8b4567.00000000000005e4:head have 217'1103 flags = delete tried to add 217'1103 flags = delete
   -40> 2020-03-17 17:13:53.374518 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbf91ecf:::rbd_data.8f3b66b8b4567.000000000000038e:head have 3093'11873 flags = delete tried to add 3093'11873 flags = delete
   -39> 2020-03-17 17:13:53.374533 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbf9b36e:::rbd_data.2a0286b8b4567.0000000000000176:head have 1619'2253 flags = delete tried to add 1619'2253 flags = delete
   -38> 2020-03-17 17:13:53.374549 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbf9dcdd:::rbd_data.128f46b8b4567.00000000000019b8:head have 217'1145 flags = delete tried to add 217'1145 flags = delete
   -37> 2020-03-17 17:13:53.374565 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbfa7411:::rbd_data.73a8f6b8b4567.0000000000000231:head have 1934'8269 flags = delete tried to add 1934'8269 flags = delete
   -36> 2020-03-17 17:13:53.374581 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbfc02e9:::rbd_data.128dd6b8b4567.00000000000002d0:head have 217'1111 flags = delete tried to add 217'1111 flags = delete
   -35> 2020-03-17 17:13:53.374597 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbfccec4:::rbd_data.128f46b8b4567.0000000000000ea6:head have 217'1125 flags = delete tried to add 217'1125 flags = delete
   -34> 2020-03-17 17:13:53.374612 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbfcf45a:::rbd_data.74a826b8b4567.000000000000004a:head have 1934'10159 flags = none tried to add 1934'10159 flags = none
   -33> 2020-03-17 17:13:53.374628 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbfeb6f9:::rbd_data.8d7246b8b4567.00000000000042a5:head have 4114'12725 flags = none tried to add 4114'12725 flags = none
   -32> 2020-03-17 17:13:53.374643 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbfeb9ce:::rbd_data.6e6fa6b8b4567.0000000000000093:head have 1934'10160 flags = none tried to add 1934'10160 flags = none
   -31> 2020-03-17 17:13:53.374659 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbff1dd0:::rbd_data.128dd6b8b4567.0000000000004332:head have 217'1240 flags = delete tried to add 217'1240 flags = delete
   -30> 2020-03-17 17:13:53.374675 201a0afdb40  0 0x201d403edd0 6.d3 unexpected need for 6:cbffd7b5:::rbd_data.128dd6b8b4567.0000000000004330:head have 217'1239 flags = delete tried to add 217'1239 flags = delete
   -29> 2020-03-17 17:13:54.738477 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c0b4c0:::rbd_data.9177446e87ccd.0000000000000b1d:head have 4711'802 flags = none tried to add 4711'802 flags = none
   -28> 2020-03-17 17:13:54.738516 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c185d7:::rbd_data.589572ae8944a.0000000000000c3a:head have 1805'339 flags = delete tried to add 1805'339 flags = delete
   -27> 2020-03-17 17:13:54.738535 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c1b3ab:::rbd_data.8922074b0dc51.0000000000003ea1:head have 3099'603 flags = delete tried to add 3099'603 flags = delete
   -26> 2020-03-17 17:13:54.738552 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c28113:::rbd_data.9117a327b23c6.00000000000002bd:head have 3122'646 flags = none tried to add 3122'646 flags = none
   -25> 2020-03-17 17:13:54.738568 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c2926a:::rbd_data.589572ae8944a.000000000000022d:head have 1805'337 flags = delete tried to add 1805'337 flags = delete
   -24> 2020-03-17 17:13:54.738584 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c2bb8b:::rbd_data.5f15f625558ec.000000000000027f:head have 1913'383 flags = none tried to add 1913'383 flags = none
   -23> 2020-03-17 17:13:54.738600 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c3ca0e:::rbd_data.8a738625558ec.0000000000005ca0:head have 3298'654 flags = none tried to add 3298'654 flags = none
   -22> 2020-03-17 17:13:54.738615 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c4892e:::rbd_data.8ff1766334873.0000000000009ea0:head have 3092'523 flags = delete tried to add 3092'523 flags = delete
   -21> 2020-03-17 17:13:54.738631 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c4ce18:::rbd_data.9177446e87ccd.0000000000000d58:head have 4750'803 flags = none tried to add 4750'803 flags = none
   -20> 2020-03-17 17:13:54.738647 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18c675ec:::rbd_data.8ff1766334873.00000000000018a2:head have 3092'468 flags = delete tried to add 3092'468 flags = delete
   -19> 2020-03-17 17:13:54.738662 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18cbb5d5:::rbd_data.9117a327b23c6.00000000000028a6:head have 3126'652 flags = none tried to add 3126'652 flags = none
   -18> 2020-03-17 17:13:54.738678 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18d2a04f:::rbd_data.110706b8b4567.0000000000000662:head have 1601'112 flags = none tried to add 1601'112 flags = none
   -17> 2020-03-17 17:13:54.738693 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18d424df:::rbd_data.9177446e87ccd.0000000000001ca7:head have 3122'649 flags = none tried to add 3122'649 flags = none
   -16> 2020-03-17 17:13:54.738708 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18d43cd1:::rbd_data.589572ae8944a.0000000000003a12:head have 1805'350 flags = delete tried to add 1805'350 flags = delete
   -15> 2020-03-17 17:13:54.738724 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18d91ef4:::rbd_data.589572ae8944a.0000000000004d34:head have 1805'354 flags = delete tried to add 1805'354 flags = delete
   -14> 2020-03-17 17:13:54.738739 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18e2953a:::rbd_data.5f15f625558ec.00000000000089d0:head have 1913'384 flags = none tried to add 1913'384 flags = none
   -13> 2020-03-17 17:13:54.738755 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18e514a7:::rbd_data.8f17c327b23c6.00000000000040a3:head have 2962'410 flags = delete tried to add 2962'410 flags = delete
   -12> 2020-03-17 17:13:54.738770 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18ea2e7b:::rbd_data.9177446e87ccd.000000000000095c:head have 4711'801 flags = none tried to add 4711'801 flags = none
   -11> 2020-03-17 17:13:54.738785 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18ed4244:::rbd_data.8ff1766334873.00000000000006a2:head have 3092'456 flags = delete tried to add 3092'456 flags = delete
   -10> 2020-03-17 17:13:54.738801 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18f0ccd9:::rbd_data.589572ae8944a.00000000000045e7:head have 1805'353 flags = delete tried to add 1805'353 flags = delete
    -9> 2020-03-17 17:13:54.738816 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18f69445:::rbd_data.589572ae8944a.0000000000000229:head have 1805'336 flags = delete tried to add 1805'336 flags = delete
    -8> 2020-03-17 17:13:54.738833 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18f88400:::rbd_data.589572ae8944a.0000000000004e31:head have 1805'355 flags = delete tried to add 1805'355 flags = delete
    -7> 2020-03-17 17:13:54.738848 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18f8e666:::rbd_data.9177446e87ccd.0000000000000387:head have 4416'800 flags = none tried to add 4416'800 flags = none
    -6> 2020-03-17 17:13:54.738864 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18fae8ee:::rbd_data.9177446e87ccd.0000000000005aa6:head have 3330'657 flags = none tried to add 3330'657 flags = none
    -5> 2020-03-17 17:13:54.738880 201a12fdb40  0 0x201d4123390 4.318 unexpected need for 4:18fdcb08:::rbd_data.110706b8b4567.0000000000001401:head have 1601'113 flags = none tried to add 1601'113 flags = none
    -4> 2020-03-17 17:14:19.091232 2000f81db40  0 -- 192.168.230.122:6812/171483 >> 192.168.230.11:0/1871093011 conn(0x20014cce840 :6812 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging authorizer
    -3> 2020-03-17 17:14:19.106328 2000ec51b40  0 -- 192.168.230.122:6812/171483 >> 192.168.230.12:0/4148513923 conn(0x20014109640 :6812 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging authorizer
    -2> 2020-03-17 17:14:19.155177 2000e3ddb40  0 -- 192.168.230.122:6812/171483 >> 192.168.230.13:0/768399090 conn(0x200141ac4d0 :6812 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging authorizer
    -1> 2020-03-17 17:14:36.138777 2000f81db40  0 -- 192.168.230.122:6812/171483 >> 192.168.230.202:0/3091162658 conn(0x200147d1f90 :6812 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging authorizer
     0> 2020-03-17 17:15:06.927121 201a6afdb40 -1 *** Caught signal (Bus error) **
 in thread 201a6afdb40 thread_name:tp_osd_tp

 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0x145882c) [0x2000245882c]
 2: (()+0x19890) [0x2000d281890]
 3: (BlueStore::ExtentMap::reshard(KeyValueDB*, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x2df0) [0x2000229da60]
 4: (BlueStore::_txc_write_nodes(BlueStore::TransContext*, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x218) [0x2000229f888]
 5: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x71c) [0x200022c7a6c]
 6: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x240) [0x20001c19ee0]
 7: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, boost::intrusive_ptr<OpRequest>)+0x90) [0x20001e871b0]
 8: (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x650) [0x2000203e5f0]
 9: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x3a4) [0x200020440c4]
 10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x94) [0x20001ecea74]
 11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x814) [0x20001de1384]
 12: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x614) [0x20001b817d4]
 13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0xb8) [0x20001f98968]
 14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1c24) [0x20001bb5fd4]
 15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0xab4) [0x200024d60a4]
 16: (ShardedThreadPool::WorkThreadSharded::entry()+0x28) [0x200024da278]
 17: (Thread::entry_wrapper()+0xec) [0x20002769b4c]
 18: (Thread::_entry_func(void*)+0x20) [0x20002769ba0]
 19: (()+0x80fc) [0x2000d2700fc]
 20: (()+0x119854) [0x2000bed1854]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.




huxiaoyu@xxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux