Re: Multiple OSD crashing on 12.2.0. Bluestore / EC pool / rbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have the same problem. A bug [1] is reported since months, but
unfortunately this is not fixed yet. I hope, if more people are having
this problem the developers can reproduce and fix it.

I was using Kernel-RBD with a Cache Tier.

so long
Thomas Coelho

[1] http://tracker.ceph.com/issues/20222


Am 06.09.2017 um 15:41 schrieb Henrik Korkuc:
> On 17-09-06 16:24, Jean-Francois Nadeau wrote:
>> Hi, 
>>
>> On a 4 node / 48 OSDs Luminous cluster Im giving a try at RBD on EC
>> pools + Bluestore.  
>>
>> Setup went fine but after a few bench runs several OSD are failing and
>> many wont even restart.
>>
>> ceph osd erasure-code-profile set myprofile \
>>    k=2\
>>    m=1 \
>>    crush-failure-domain=host
>> ceph osd pool create mypool 1024 1024 erasure myprofile   
>> ceph osd pool set mypool allow_ec_overwrites true
>> rbd pool init mypool
>> ceph -s
>> ceph health detail
>> ceph osd pool create metapool 1024 1024 replicated
>> rbd create --size 1024G --data-pool mypool --image metapool/test1
>> rbd bench -p metapool test1 --io-type write --io-size 8192
>> --io-pattern rand --io-total 10G
>> ...
>>
>>
>> One of many OSD failing logs
>>
>> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>> Started Ceph object storage daemon osd.12.
>> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12
>> /var/lib/ceph/osd/ceph-12/journal
>> Sep 05 17:02:56 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> 2017-09-05 17:02:56.627301 7fe1a2e42d00 -1 osd.12 2219 log_to_monitors
>> {default=true}
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> 2017-09-05 17:02:58.686723 7fe1871ac700 -1
>> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transac
>> tion error (2) No such file or directory not handled on operation 15
>> (op 0, counting from 0)
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> 2017-09-05 17:02:58.686742 7fe1871ac700 -1
>> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error
>>  code
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
>> centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
>> In function 'void BlueStore::_txc_add_transaction(Blu
>> eStore::TransContext*, ObjectStore::Transaction*)' thread 7fe1871ac700
>> time 2017-09-05 17:02:58.686821
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
>> centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
>> 9282: FAILED assert(0 == "unexpected error")
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>> luminous (rc)
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 1:
>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x110) [0x7fe1a38bf510]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2:
>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>> ObjectStore::Transaction*)+0x1487)
>>  [0x7fe1a3796057]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 3:
>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>> std::vector<ObjectStore::Transaction,
>>  std::allocator<ObjectStore::Transaction> >&,
>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>> [0x7fe1a37970a0]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 4:
>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<Object
>> Store::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65)
>> [0x7fe1a3508745]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 5:
>> (ECBackend::handle_sub_write(pg_shard_t,
>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTrace
>> r::Trace const&, Context*)+0x631) [0x7fe1a3628711]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 6:
>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>> [0x7fe1a36392b7]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 7:
>> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>> [0x7fe1a353da10]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 8:
>> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x58e) [0x
>> 7fe1a34a9a7e]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 9:
>> (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHan
>> dle&)+0x3f9) [0x7fe1a333c729]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x7fe1a35ac1
>> 97]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> 11: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0xfce) [0x7fe1a3367c8e]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>> [0x7fe1a38c5029]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fe1a38c6fc0]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> 14: (()+0x7dc5) [0x7fe1a0484dc5]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> 15: (clone()+0x6d) [0x7fe19f57876d]
>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>> [root@r72-k7-06-01 ~]# journalctl -u  ceph-osd@12 --no-pager -n 100
>> -- Logs begin at Wed 2017-08-30 10:26:26 UTC, end at Tue 2017-09-05
>> 22:06:19 UTC. --
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 15: (clone()+0x6d) [0x7f0160c9d76d]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> -5496> 2017-09-05 17:08:03.460844 7f0164567d00 -1 osd.12 2362
>> log_to_monitors {default=true}
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> -74> 2017-09-05 17:08:05.837648 7f01488d1700 -1
>> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2) No
>> such file or directory not handled on operation 15 (op 0, counting from 0)
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> -73> 2017-09-05 17:08:05.837670 7f01488d1700 -1
>> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error code
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0>
>> 2017-09-05 17:08:05.843218 7f01488d1700 -1
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
>> In function 'void
>> BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>> ObjectStore::Transaction*)' thread 7f01488d1700 time 2017-09-05
>> 17:08:05.837770
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
>> 9282: FAILED assert(0 == "unexpected error")
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>> luminous (rc)
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x110) [0x7f0164fe4510]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>> std::vector<ObjectStore::Transaction,
>> std::allocator<ObjectStore::Transaction> >&,
>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>> [0x7f0164ebc0a0]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
>> (ECBackend::handle_sub_write(pg_shard_t,
>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
>> Context*)+0x631) [0x7f0164d4d711]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>> [0x7f0164d5e2b7]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
>> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>> [0x7f0164c62a10]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
>> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
>> (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
>> [0x7f0164a61729]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x7f0164cd1197]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 11: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>> [0x7f0164fea029]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 14: (()+0x7dc5) [0x7f0161ba9dc5]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 15: (clone()+0x6d) [0x7f0160c9d76d]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> *** Caught signal (Aborted) **
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in
>> thread 7f01488d1700 thread_name:tp_osd_tp
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>> luminous (rc)
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
>> (()+0xa23b21) [0x7f0164fa5b21]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
>> (()+0xf370) [0x7f0161bb1370]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
>> (gsignal()+0x37) [0x7f0160bdb1d7]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
>> (abort()+0x148) [0x7f0160bdc8c8]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x284) [0x7f0164fe4684]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>> std::vector<ObjectStore::Transaction,
>> std::allocator<ObjectStore::Transaction> >&,
>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>> [0x7f0164ebc0a0]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
>> (ECBackend::handle_sub_write(pg_shard_t,
>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
>> Context*)+0x631) [0x7f0164d4d711]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 10:
>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>> [0x7f0164d5e2b7]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>> [0x7f0164c62a10]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
>> [0x7f0164a61729]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x7f0164cd1197]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 15: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>> [0x7f0164fea029]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 18: (()+0x7dc5) [0x7f0161ba9dc5]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 19: (clone()+0x6d) [0x7f0160c9d76d]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) **
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in
>> thread 7f01488d1700 thread_name:tp_osd_tp
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>> luminous (rc)
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
>> (()+0xa23b21) [0x7f0164fa5b21]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
>> (()+0xf370) [0x7f0161bb1370]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
>> (gsignal()+0x37) [0x7f0160bdb1d7]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
>> (abort()+0x148) [0x7f0160bdc8c8]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x284) [0x7f0164fe4684]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>> std::vector<ObjectStore::Transaction,
>> std::allocator<ObjectStore::Transaction> >&,
>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>> [0x7f0164ebc0a0]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
>> (ECBackend::handle_sub_write(pg_shard_t,
>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
>> Context*)+0x631) [0x7f0164d4d711]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 10:
>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>> [0x7f0164d5e2b7]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>> [0x7f0164c62a10]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
>> [0x7f0164a61729]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x7f0164cd1197]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 15: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>> [0x7f0164fea029]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 18: (()+0x7dc5) [0x7f0161ba9dc5]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 19: (clone()+0x6d) [0x7f0160c9d76d]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0>
>> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) **
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in
>> thread 7f01488d1700 thread_name:tp_osd_tp
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>> luminous (rc)
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
>> (()+0xa23b21) [0x7f0164fa5b21]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
>> (()+0xf370) [0x7f0161bb1370]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
>> (gsignal()+0x37) [0x7f0160bdb1d7]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
>> (abort()+0x148) [0x7f0160bdc8c8]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x284) [0x7f0164fe4684]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>> std::vector<ObjectStore::Transaction,
>> std::allocator<ObjectStore::Transaction> >&,
>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>> [0x7f0164ebc0a0]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
>> (ECBackend::handle_sub_write(pg_shard_t,
>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
>> Context*)+0x631) [0x7f0164d4d711]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 10:
>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>> [0x7f0164d5e2b7]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>> [0x7f0164c62a10]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
>> [0x7f0164a61729]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>> const&)+0x57) [0x7f0164cd1197]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 15: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>> [0x7f0164fea029]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 18: (()+0x7dc5) [0x7f0161ba9dc5]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> 19: (clone()+0x6d) [0x7f0160c9d76d]
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>> ceph-osd@12.service: main process exited, code=killed, status=6/ABRT
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit
>> ceph-osd@12.service entered failed state.
>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>> ceph-osd@12.service failed.
>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>> ceph-osd@12.service holdoff time over, scheduling restart.
>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: start
>> request repeated too quickly for ceph-osd@12.service
>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Failed
>> to start Ceph object storage daemon osd.12.
>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit
>> ceph-osd@12.service entered failed state.
>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>> ceph-osd@12.service failed.
>>
>>
>>
>> Drives are OK, FS mounted and no suspicious kernel logs. 
>>
>> What's next to troubleshoot ? 
>>
> While I am unable to assist you with this problem, I think it would be
> useful to create tracker.ceph.com issue and attack OSD logs as it dumps
> recent events too.
> 
> BTW I have few open tasks too related to OSD stability/reliability on
> similar setup - http://tracker.ceph.com/issues/21259 and
> http://tracker.ceph.com/issues/21180
> Can you grep your logs to see if you experienced something similar?
> 
> And I got "FAILED assert(i->prior_version == last || i->is_error())",
> "FAILED assert(p.second->shared_blob_set.empty())" and abort on
> thread_name:tp_peering today which I didn't get to report yet.
> 
> 
>>
>> thanks !
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Dr. Thomas Coelho
HRZ Service Center Riedberg

Goethe Universität Frankfurt | Campus Riedberg
Max-von-Laue Str. 1 | 60438 Frankfurt am Main
Room 1.128

Phone:  +49-(0)69 798 40026
E-Mail: coelho@xxxxxxxxxxxxxxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux