OSDs are crashing during PG replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Every time 2 of 18 OSDs are crashing. I think it's happening when run PG replication because crashing only 2 OSDs and every time they're are the same.

0> 2016-02-24 04:51:45.884445 7fd994825700 -1 osd/ReplicatedPG.cc: In function 'int ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, ceph::buffer::list::iterator&, OSDOp&, ObjectContextRef&, bool)' thread 7fd994825700 time 2016-02-24 04:51:45.870995
osd/ReplicatedPG.cc: 5558: FAILED assert(cursor.data_complete)

 ceph version 0.80.11-8-g95c4287 (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
 1: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, ceph::buffer::list::iterator&, OSDOp&, std::tr1::shared_ptr<ObjectContext>&, bool)+0xffc) [0x7c1f7c]
 2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x4171) [0x809f21]
 3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62) [0x814622]
 4: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) [0x815098]
 5: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3dd4) [0x81a3f4]
 6: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x66d) [0x7b4ecd]
 7: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a5) [0x600ee5]
 8: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x203) [0x61cba3]
 9: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c]
 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0]
 11: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0]
 12: (()+0x7dc5) [0x7fd9ad03edc5]
 13: (clone()+0x6d) [0x7fd9abd2828d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.3.log
--- end dump of recent events ---
2016-02-24 04:51:45.944447 7fd994825700 -1 *** Caught signal (Aborted) **
 in thread 7fd994825700

 ceph version 0.80.11-8-g95c4287 (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
 1: /usr/bin/ceph-osd() [0x9a24f6]
 2: (()+0xf100) [0x7fd9ad046100]
 3: (gsignal()+0x37) [0x7fd9abc675f7]
 4: (abort()+0x148) [0x7fd9abc68ce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fd9ac56b9d5]
 6: (()+0x5e946) [0x7fd9ac569946]
 7: (()+0x5e973) [0x7fd9ac569973]
 8: (()+0x5eb93) [0x7fd9ac569b93]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ef) [0xa8d9df]
 10: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, ceph::buffer::list::iterator&, OSDOp&, std::tr1::shared_ptr<ObjectContext>&, bool)+0xffc) [0x7c1f7c]
 11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x4171) [0x809f21]
 12: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62) [0x814622]
 13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) [0x815098]
 14: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3dd4) [0x81a3f4]
 15: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x66d) [0x7b4ecd]
 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a5) [0x600ee5]
 17: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x203) [0x61cba3]
 18: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c]
 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0]
 20: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0]
 21: (()+0x7dc5) [0x7fd9ad03edc5]
 22: (clone()+0x6d) [0x7fd9abd2828d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
    -5> 2016-02-24 04:51:45.904559 7fd995026700  5 -- op tracker -- , seq: 19230, time: 2016-02-24 04:51:45.904559, event: started, request: osd_op(osd.13.12097:806246 rb.0.218d6.238e1f29.000000010db3@snapdir [list-snaps] 3.94c2bed2 ack+read+ignore_cache+ignore_overlay+map_snap_clone e13252) v4
    -4> 2016-02-24 04:51:45.904598 7fd995026700  1 -- 172.16.0.1:6801/419703 --> 172.16.0.3:6844/12260 -- osd_op_reply(806246 rb.0.218d6.238e1f29.000000010db3 [list-snaps] v0'0 uv27683057 _ondisk_ = 0) v6 -- ?+0 0x9f90800 con 0x1b7838c0
    -3> 2016-02-24 04:51:45.904616 7fd995026700  5 -- op tracker -- , seq: 19230, time: 2016-02-24 04:51:45.904616, event: done, request: osd_op(osd.13.12097:806246 rb.0.218d6.238e1f29.000000010db3@snapdir [list-snaps] 3.94c2bed2 ack+read+ignore_cache+ignore_overlay+map_snap_clone e13252) v4
    -2> 2016-02-24 04:51:45.904637 7fd995026700  5 -- op tracker -- , seq: 19231, time: 2016-02-24 04:51:45.904637, event: reached_pg, request: osd_op(osd.13.12097:806247 rb.0.218d6.238e1f29.000000010db3 [copy-get max 8388608] 3.94c2bed2 ack+read+ignore_cache+ignore_overlay+map_snap_clone e13252) v4
    -1> 2016-02-24 04:51:45.904673 7fd995026700  5 -- op tracker -- , seq: 19231, time: 2016-02-24 04:51:45.904673, event: started, request: osd_op(osd.13.12097:806247 rb.0.218d6.238e1f29.000000010db3 [copy-get max 8388608] 3.94c2bed2 ack+read+ignore_cache+ignore_overlay+map_snap_clone e13252) v4
     0> 2016-02-24 04:51:45.944447 7fd994825700 -1 *** Caught signal (Aborted) **
 in thread 7fd994825700

 ceph version 0.80.11-8-g95c4287 (95c4287b5d24b762bc8538633c5bb2918ecfe4dd)
 1: /usr/bin/ceph-osd() [0x9a24f6]
 2: (()+0xf100) [0x7fd9ad046100]
 3: (gsignal()+0x37) [0x7fd9abc675f7]
 4: (abort()+0x148) [0x7fd9abc68ce8]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7fd9ac56b9d5]
 6: (()+0x5e946) [0x7fd9ac569946]
 7: (()+0x5e973) [0x7fd9ac569973]
 8: (()+0x5eb93) [0x7fd9ac569b93]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ef) [0xa8d9df]
 10: (ReplicatedPG::fill_in_copy_get(ReplicatedPG::OpContext*, ceph::buffer::list::iterator&, OSDOp&, std::tr1::shared_ptr<ObjectContext>&, bool)+0xffc) [0x7c1f7c]
 11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x4171) [0x809f21]
 12: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x62) [0x814622]
 13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x5f8) [0x815098]
 14: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3dd4) [0x81a3f4]
 15: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x66d) [0x7b4ecd]
 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a5) [0x600ee5]
 17: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x203) [0x61cba3]
 18: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0xac) [0x660f2c]
 19: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb20) [0xa7def0]
 20: (ThreadPool::WorkThread::entry()+0x10) [0xa7ede0]
 21: (()+0x7dc5) [0x7fd9ad03edc5]
 22: (clone()+0x6d) [0x7fd9abd2828d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.3.log
--- end dump of recent events ---

--
Alexander Gubanov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux