Hi, I have the same problem. A bug [1] is reported since months, but unfortunately this is not fixed yet. I hope, if more people are having this problem the developers can reproduce and fix it. I was using Kernel-RBD with a Cache Tier. so long Thomas Coelho [1] http://tracker.ceph.com/issues/20222 Am 06.09.2017 um 15:41 schrieb Henrik Korkuc: > On 17-09-06 16:24, Jean-Francois Nadeau wrote: >> Hi, >> >> On a 4 node / 48 OSDs Luminous cluster Im giving a try at RBD on EC >> pools + Bluestore. >> >> Setup went fine but after a few bench runs several OSD are failing and >> many wont even restart. >> >> ceph osd erasure-code-profile set myprofile \ >> k=2\ >> m=1 \ >> crush-failure-domain=host >> ceph osd pool create mypool 1024 1024 erasure myprofile >> ceph osd pool set mypool allow_ec_overwrites true >> rbd pool init mypool >> ceph -s >> ceph health detail >> ceph osd pool create metapool 1024 1024 replicated >> rbd create --size 1024G --data-pool mypool --image metapool/test1 >> rbd bench -p metapool test1 --io-type write --io-size 8192 >> --io-pattern rand --io-total 10G >> ... >> >> >> One of many OSD failing logs >> >> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: >> Started Ceph object storage daemon osd.12. >> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 >> /var/lib/ceph/osd/ceph-12/journal >> Sep 05 17:02:56 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> 2017-09-05 17:02:56.627301 7fe1a2e42d00 -1 osd.12 2219 log_to_monitors >> {default=true} >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> 2017-09-05 17:02:58.686723 7fe1871ac700 -1 >> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transac >> tion error (2) No such file or directory not handled on operation 15 >> (op 0, counting from 0) >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> 2017-09-05 17:02:58.686742 7fe1871ac700 -1 >> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error >> code >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/ >> centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: >> In function 'void BlueStore::_txc_add_transaction(Blu >> eStore::TransContext*, ObjectStore::Transaction*)' thread 7fe1871ac700 >> time 2017-09-05 17:02:58.686821 >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/ >> centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: >> 9282: FAILED assert(0 == "unexpected error") >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) >> luminous (rc) >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 1: >> (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x110) [0x7fe1a38bf510] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2: >> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, >> ObjectStore::Transaction*)+0x1487) >> [0x7fe1a3796057] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 3: >> (BlueStore::queue_transactions(ObjectStore::Sequencer*, >> std::vector<ObjectStore::Transaction, >> std::allocator<ObjectStore::Transaction> >&, >> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) >> [0x7fe1a37970a0] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 4: >> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<Object >> Store::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) >> [0x7fe1a3508745] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 5: >> (ECBackend::handle_sub_write(pg_shard_t, >> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTrace >> r::Trace const&, Context*)+0x631) [0x7fe1a3628711] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 6: >> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) >> [0x7fe1a36392b7] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 7: >> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >> [0x7fe1a353da10] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 8: >> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x58e) [0x >> 7fe1a34a9a7e] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 9: >> (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHan >> dle&)+0x3f9) [0x7fe1a333c729] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x7fe1a35ac1 >> 97] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> 11: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0xfce) [0x7fe1a3367c8e] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) >> [0x7fe1a38c5029] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fe1a38c6fc0] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> 14: (()+0x7dc5) [0x7fe1a0484dc5] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> 15: (clone()+0x6d) [0x7fe19f57876d] >> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> [root@r72-k7-06-01 ~]# journalctl -u ceph-osd@12 --no-pager -n 100 >> -- Logs begin at Wed 2017-08-30 10:26:26 UTC, end at Tue 2017-09-05 >> 22:06:19 UTC. -- >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 15: (clone()+0x6d) [0x7f0160c9d76d] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> -5496> 2017-09-05 17:08:03.460844 7f0164567d00 -1 osd.12 2362 >> log_to_monitors {default=true} >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> -74> 2017-09-05 17:08:05.837648 7f01488d1700 -1 >> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2) No >> such file or directory not handled on operation 15 (op 0, counting from 0) >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> -73> 2017-09-05 17:08:05.837670 7f01488d1700 -1 >> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error code >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0> >> 2017-09-05 17:08:05.843218 7f01488d1700 -1 >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: >> In function 'void >> BlueStore::_txc_add_transaction(BlueStore::TransContext*, >> ObjectStore::Transaction*)' thread 7f01488d1700 time 2017-09-05 >> 17:08:05.837770 >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: >> 9282: FAILED assert(0 == "unexpected error") >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) >> luminous (rc) >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: >> (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x110) [0x7f0164fe4510] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: >> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, >> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: >> (BlueStore::queue_transactions(ObjectStore::Sequencer*, >> std::vector<ObjectStore::Transaction, >> std::allocator<ObjectStore::Transaction> >&, >> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) >> [0x7f0164ebc0a0] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: >> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: >> (ECBackend::handle_sub_write(pg_shard_t, >> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, >> Context*)+0x631) [0x7f0164d4d711] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: >> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) >> [0x7f0164d5e2b7] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: >> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >> [0x7f0164c62a10] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: >> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: >> (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) >> [0x7f0164a61729] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x7f0164cd1197] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 11: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) >> [0x7f0164fea029] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 14: (()+0x7dc5) [0x7f0161ba9dc5] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 15: (clone()+0x6d) [0x7f0160c9d76d] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> *** Caught signal (Aborted) ** >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in >> thread 7f01488d1700 thread_name:tp_osd_tp >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) >> luminous (rc) >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: >> (()+0xa23b21) [0x7f0164fa5b21] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: >> (()+0xf370) [0x7f0161bb1370] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: >> (gsignal()+0x37) [0x7f0160bdb1d7] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: >> (abort()+0x148) [0x7f0160bdc8c8] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: >> (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x284) [0x7f0164fe4684] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: >> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, >> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: >> (BlueStore::queue_transactions(ObjectStore::Sequencer*, >> std::vector<ObjectStore::Transaction, >> std::allocator<ObjectStore::Transaction> >&, >> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) >> [0x7f0164ebc0a0] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: >> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: >> (ECBackend::handle_sub_write(pg_shard_t, >> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, >> Context*)+0x631) [0x7f0164d4d711] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 10: >> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) >> [0x7f0164d5e2b7] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >> [0x7f0164c62a10] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) >> [0x7f0164a61729] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x7f0164cd1197] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 15: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) >> [0x7f0164fea029] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 18: (()+0x7dc5) [0x7f0161ba9dc5] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 19: (clone()+0x6d) [0x7f0160c9d76d] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) ** >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in >> thread 7f01488d1700 thread_name:tp_osd_tp >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) >> luminous (rc) >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: >> (()+0xa23b21) [0x7f0164fa5b21] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: >> (()+0xf370) [0x7f0161bb1370] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: >> (gsignal()+0x37) [0x7f0160bdb1d7] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: >> (abort()+0x148) [0x7f0160bdc8c8] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: >> (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x284) [0x7f0164fe4684] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: >> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, >> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: >> (BlueStore::queue_transactions(ObjectStore::Sequencer*, >> std::vector<ObjectStore::Transaction, >> std::allocator<ObjectStore::Transaction> >&, >> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) >> [0x7f0164ebc0a0] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: >> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: >> (ECBackend::handle_sub_write(pg_shard_t, >> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, >> Context*)+0x631) [0x7f0164d4d711] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 10: >> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) >> [0x7f0164d5e2b7] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >> [0x7f0164c62a10] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) >> [0x7f0164a61729] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x7f0164cd1197] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 15: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) >> [0x7f0164fea029] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 18: (()+0x7dc5) [0x7f0161ba9dc5] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 19: (clone()+0x6d) [0x7f0160c9d76d] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0> >> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) ** >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in >> thread 7f01488d1700 thread_name:tp_osd_tp >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) >> luminous (rc) >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: >> (()+0xa23b21) [0x7f0164fa5b21] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: >> (()+0xf370) [0x7f0161bb1370] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: >> (gsignal()+0x37) [0x7f0160bdb1d7] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: >> (abort()+0x148) [0x7f0160bdc8c8] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: >> (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x284) [0x7f0164fe4684] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: >> (BlueStore::_txc_add_transaction(BlueStore::TransContext*, >> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: >> (BlueStore::queue_transactions(ObjectStore::Sequencer*, >> std::vector<ObjectStore::Transaction, >> std::allocator<ObjectStore::Transaction> >&, >> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) >> [0x7f0164ebc0a0] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: >> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: >> (ECBackend::handle_sub_write(pg_shard_t, >> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, >> Context*)+0x631) [0x7f0164d4d711] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 10: >> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) >> [0x7f0164d5e2b7] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) >> [0x7f0164c62a10] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, >> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) >> [0x7f0164a61729] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> >> const&)+0x57) [0x7f0164cd1197] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 15: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) >> [0x7f0164fea029] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 18: (()+0x7dc5) [0x7f0161ba9dc5] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> 19: (clone()+0x6d) [0x7f0160c9d76d] >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: >> ceph-osd@12.service: main process exited, code=killed, status=6/ABRT >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit >> ceph-osd@12.service entered failed state. >> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: >> ceph-osd@12.service failed. >> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: >> ceph-osd@12.service holdoff time over, scheduling restart. >> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: start >> request repeated too quickly for ceph-osd@12.service >> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Failed >> to start Ceph object storage daemon osd.12. >> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit >> ceph-osd@12.service entered failed state. >> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: >> ceph-osd@12.service failed. >> >> >> >> Drives are OK, FS mounted and no suspicious kernel logs. >> >> What's next to troubleshoot ? >> > While I am unable to assist you with this problem, I think it would be > useful to create tracker.ceph.com issue and attack OSD logs as it dumps > recent events too. > > BTW I have few open tasks too related to OSD stability/reliability on > similar setup - http://tracker.ceph.com/issues/21259 and > http://tracker.ceph.com/issues/21180 > Can you grep your logs to see if you experienced something similar? > > And I got "FAILED assert(i->prior_version == last || i->is_error())", > "FAILED assert(p.second->shared_blob_set.empty())" and abort on > thread_name:tp_peering today which I didn't get to report yet. > > >> >> thanks ! >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Dr. Thomas Coelho HRZ Service Center Riedberg Goethe Universität Frankfurt | Campus Riedberg Max-von-Laue Str. 1 | 60438 Frankfurt am Main Room 1.128 Phone: +49-(0)69 798 40026 E-Mail: coelho@xxxxxxxxxxxxxxxxxxxxxxxxxx _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com