Re: Multiple OSD crashing on 12.2.0. Bluestore / EC pool / rbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17-09-06 16:24, Jean-Francois Nadeau wrote:
Hi, 

On a 4 node / 48 OSDs Luminous cluster Im giving a try at RBD on EC pools + Bluestore.  

Setup went fine but after a few bench runs several OSD are failing and many wont even restart.

ceph osd erasure-code-profile set myprofile \
   k=2\
   m=1 \
   crush-failure-domain=host
ceph osd pool create mypool 1024 1024 erasure myprofile   
ceph osd pool set mypool allow_ec_overwrites true
rbd pool init mypool
ceph -s
ceph health detail
ceph osd pool create metapool 1024 1024 replicated
rbd create --size 1024G --data-pool mypool --image metapool/test1
rbd bench -p metapool test1 --io-type write --io-size 8192 --io-pattern rand --io-total 10G
...


One of many OSD failing logs

Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Started Ceph object storage daemon osd.12.
Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 /var/lib/ceph/osd/ceph-12/journal
Sep 05 17:02:56 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2017-09-05 17:02:56.627301 7fe1a2e42d00 -1 osd.12 2219 log_to_monitors {default=true}
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2017-09-05 17:02:58.686723 7fe1871ac700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transac
tion error (2) No such file or directory not handled on operation 15 (op 0, counting from 0)
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2017-09-05 17:02:58.686742 7fe1871ac700 -1 bluestore(/var/lib/ceph/osd/ceph-12) unexpected error
 code
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(Blu
eStore::TransContext*, ObjectStore::Transaction*)' thread 7fe1871ac700 time 2017-09-05 17:02:58.686821
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: 9282: FAILED assert(0 == "unexpected error")
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7fe1a38bf510]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1487)
 [0x7fe1a3796057]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 3: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction,
 std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x7fe1a37970a0]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 4: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<Object
Store::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7fe1a3508745]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 5: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTrace
r::Trace const&, Context*)+0x631) [0x7fe1a3628711]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 6: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) [0x7fe1a36392b7]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7fe1a353da10]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x58e) [0x
7fe1a34a9a7e]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHan
dle&)+0x3f9) [0x7fe1a333c729]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7fe1a35ac1
97]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x7fe1a3367c8e]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7fe1a38c5029]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fe1a38c6fc0]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 14: (()+0x7dc5) [0x7fe1a0484dc5]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 15: (clone()+0x6d) [0x7fe19f57876d]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[root@r72-k7-06-01 ~]# journalctl -u  ceph-osd@12 --no-pager -n 100
-- Logs begin at Wed 2017-08-30 10:26:26 UTC, end at Tue 2017-09-05 22:06:19 UTC. --
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 15: (clone()+0x6d) [0x7f0160c9d76d]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: -5496> 2017-09-05 17:08:03.460844 7f0164567d00 -1 osd.12 2362 log_to_monitors {default=true}
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: -74> 2017-09-05 17:08:05.837648 7f01488d1700 -1 bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2) No such file or directory not handled on operation 15 (op 0, counting from 0)
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: -73> 2017-09-05 17:08:05.837670 7f01488d1700 -1 bluestore(/var/lib/ceph/osd/ceph-12) unexpected error code
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0> 2017-09-05 17:08:05.843218 7f01488d1700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f01488d1700 time 2017-09-05 17:08:05.837770
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: 9282: FAILED assert(0 == "unexpected error")
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x7f0164fe4510]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x7f0164ebc0a0]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x631) [0x7f0164d4d711]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) [0x7f0164d5e2b7]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f0164c62a10]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x7f0164a61729]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f0164cd1197]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 11: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7f0164fea029]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 14: (()+0x7dc5) [0x7f0161ba9dc5]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 15: (clone()+0x6d) [0x7f0160c9d76d]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: *** Caught signal (Aborted) **
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in thread 7f01488d1700 thread_name:tp_osd_tp
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: (()+0xa23b21) [0x7f0164fa5b21]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: (()+0xf370) [0x7f0161bb1370]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: (gsignal()+0x37) [0x7f0160bdb1d7]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: (abort()+0x148) [0x7f0160bdc8c8]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x7f0164fe4684]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x7f0164ebc0a0]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x631) [0x7f0164d4d711]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 10: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) [0x7f0164d5e2b7]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f0164c62a10]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x7f0164a61729]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f0164cd1197]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7f0164fea029]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 18: (()+0x7dc5) [0x7f0161ba9dc5]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 19: (clone()+0x6d) [0x7f0160c9d76d]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) **
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in thread 7f01488d1700 thread_name:tp_osd_tp
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: (()+0xa23b21) [0x7f0164fa5b21]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: (()+0xf370) [0x7f0161bb1370]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: (gsignal()+0x37) [0x7f0160bdb1d7]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: (abort()+0x148) [0x7f0160bdc8c8]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x7f0164fe4684]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x7f0164ebc0a0]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x631) [0x7f0164d4d711]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 10: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) [0x7f0164d5e2b7]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f0164c62a10]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x7f0164a61729]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f0164cd1197]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7f0164fea029]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 18: (()+0x7dc5) [0x7f0161ba9dc5]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 19: (clone()+0x6d) [0x7f0160c9d76d]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) **
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in thread 7f01488d1700 thread_name:tp_osd_tp
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1: (()+0xa23b21) [0x7f0164fa5b21]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2: (()+0xf370) [0x7f0161bb1370]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3: (gsignal()+0x37) [0x7f0160bdb1d7]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4: (abort()+0x148) [0x7f0160bdc8c8]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x284) [0x7f0164fe4684]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0) [0x7f0164ebc0a0]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x631) [0x7f0164d4d711]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 10: (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327) [0x7f0164d5e2b7]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x7f0164c62a10]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9) [0x7f0164a61729]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x7f0164cd1197]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 15: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) [0x7f0164fea029]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 18: (()+0x7dc5) [0x7f0161ba9dc5]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 19: (clone()+0x6d) [0x7f0160c9d76d]
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: ceph-osd@12.service: main process exited, code=killed, status=6/ABRT
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit ceph-osd@12.service entered failed state.
Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: ceph-osd@12.service failed.
Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: ceph-osd@12.service holdoff time over, scheduling restart.
Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: start request repeated too quickly for ceph-osd@12.service
Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Failed to start Ceph object storage daemon osd.12.
Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit ceph-osd@12.service entered failed state.
Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: ceph-osd@12.service failed.



Drives are OK, FS mounted and no suspicious kernel logs. 

What's next to troubleshoot ? 

While I am unable to assist you with this problem, I think it would be useful to create tracker.ceph.com issue and attack OSD logs as it dumps recent events too.

BTW I have few open tasks too related to OSD stability/reliability on similar setup - http://tracker.ceph.com/issues/21259 and http://tracker.ceph.com/issues/21180
Can you grep your logs to see if you experienced something similar?

And I got "FAILED assert(i->prior_version == last || i->is_error())", "FAILED assert(p.second->shared_blob_set.empty())" and abort on thread_name:tp_peering today which I didn't get to report yet.



thanks !



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux