Re: Multiple OSD crashing on 12.2.0. Bluestore / EC pool / rbd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm hitting this same issue on 12.2.5. Upgraded one node to 12.2.10 and it didn't clear.

6 OSDs flapping with this error. I know this is an older issue but are traces still needed? I don't see a resolution available.

Thanks,

Dan

On Wed, Sep 6, 2017 at 10:30 PM Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
These error logs look like they are being generated here,
https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L8987-L8993
or possibly here,
https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L9230-L9236.

Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
2017-09-05 17:02:58.686723 7fe1871ac700 -1
bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2) No
such file or directory not handled on operation 15 (op 0, counting
from 0)

The table of operations is here,
https://github.com/ceph/ceph/blob/master/src/os/ObjectStore.h#L370

Operation 15 is OP_SETATTRS so it appears to be some extended
attribute operation that is failing.

Can someone run the ceph-osd under strace and find the last system
call (probably a call that manipulates xattrs) that returns -2 in the
thread that crashes (or that outputs the above messages)?

strace -fvttyyTo /tmp/strace.out -s 1024 ceph-osd [system specific
argumentsarguments]

Capturing logs with "debug_bluestore = 20" may tell us more as well.

We need to work out what resource it is trying to access when it
receives the error '2' (No such file or directory).


On Thu, Sep 7, 2017 at 12:13 AM, Thomas Coelho
<coelho@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> I have the same problem. A bug [1] is reported since months, but
> unfortunately this is not fixed yet. I hope, if more people are having
> this problem the developers can reproduce and fix it.
>
> I was using Kernel-RBD with a Cache Tier.
>
> so long
> Thomas Coelho
>
> [1] http://tracker.ceph.com/issues/20222
>
>
> Am 06.09.2017 um 15:41 schrieb Henrik Korkuc:
>> On 17-09-06 16:24, Jean-Francois Nadeau wrote:
>>> Hi,
>>>
>>> On a 4 node / 48 OSDs Luminous cluster Im giving a try at RBD on EC
>>> pools + Bluestore.
>>>
>>> Setup went fine but after a few bench runs several OSD are failing and
>>> many wont even restart.
>>>
>>> ceph osd erasure-code-profile set myprofile \
>>>    k=2\
>>>    m=1 \
>>>    crush-failure-domain=host
>>> ceph osd pool create mypool 1024 1024 erasure myprofile
>>> ceph osd pool set mypool allow_ec_overwrites true
>>> rbd pool init mypool
>>> ceph -s
>>> ceph health detail
>>> ceph osd pool create metapool 1024 1024 replicated
>>> rbd create --size 1024G --data-pool mypool --image metapool/test1
>>> rbd bench -p metapool test1 --io-type write --io-size 8192
>>> --io-pattern rand --io-total 10G
>>> ...
>>>
>>>
>>> One of many OSD failing logs
>>>
>>> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>>> Started Ceph object storage daemon osd.12.
>>> Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12
>>> /var/lib/ceph/osd/ceph-12/journal
>>> Sep 05 17:02:56 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> 2017-09-05 17:02:56.627301 7fe1a2e42d00 -1 osd.12 2219 log_to_monitors
>>> {default=true}
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> 2017-09-05 17:02:58.686723 7fe1871ac700 -1
>>> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transac
>>> tion error (2) No such file or directory not handled on operation 15
>>> (op 0, counting from 0)
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> 2017-09-05 17:02:58.686742 7fe1871ac700 -1
>>> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error
>>>  code
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
>>> centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
>>> In function 'void BlueStore::_txc_add_transaction(Blu
>>> eStore::TransContext*, ObjectStore::Transaction*)' thread 7fe1871ac700
>>> time 2017-09-05 17:02:58.686821
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
>>> centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
>>> 9282: FAILED assert(0 == "unexpected error")
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>>> luminous (rc)
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 1:
>>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x110) [0x7fe1a38bf510]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2:
>>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>>> ObjectStore::Transaction*)+0x1487)
>>>  [0x7fe1a3796057]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 3:
>>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>>> std::vector<ObjectStore::Transaction,
>>>  std::allocator<ObjectStore::Transaction> >&,
>>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>>> [0x7fe1a37970a0]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 4:
>>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<Object
>>> Store::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x65)
>>> [0x7fe1a3508745]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 5:
>>> (ECBackend::handle_sub_write(pg_shard_t,
>>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTrace
>>> r::Trace const&, Context*)+0x631) [0x7fe1a3628711]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 6:
>>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>>> [0x7fe1a36392b7]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 7:
>>> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>> [0x7fe1a353da10]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 8:
>>> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>> ThreadPool::TPHandle&)+0x58e) [0x
>>> 7fe1a34a9a7e]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 9:
>>> (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHan
>>> dle&)+0x3f9) [0x7fe1a333c729]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>> const&)+0x57) [0x7fe1a35ac1
>>> 97]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> 11: (OSD::ShardedOpWQ::_process(unsigned int,
>>> ceph::heartbeat_handle_d*)+0xfce) [0x7fe1a3367c8e]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>>> [0x7fe1a38c5029]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fe1a38c6fc0]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> 14: (()+0x7dc5) [0x7fe1a0484dc5]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> 15: (clone()+0x6d) [0x7fe19f57876d]
>>> Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]:
>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>> [root@r72-k7-06-01 ~]# journalctl -u  ceph-osd@12 --no-pager -n 100
>>> -- Logs begin at Wed 2017-08-30 10:26:26 UTC, end at Tue 2017-09-05
>>> 22:06:19 UTC. --
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 15: (clone()+0x6d) [0x7f0160c9d76d]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> -5496> 2017-09-05 17:08:03.460844 7f0164567d00 -1 osd.12 2362
>>> log_to_monitors {default=true}
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> -74> 2017-09-05 17:08:05.837648 7f01488d1700 -1
>>> bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transaction error (2) No
>>> such file or directory not handled on operation 15 (op 0, counting from 0)
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> -73> 2017-09-05 17:08:05.837670 7f01488d1700 -1
>>> bluestore(/var/lib/ceph/osd/ceph-12) unexpected error code
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0>
>>> 2017-09-05 17:08:05.843218 7f01488d1700 -1
>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
>>> In function 'void
>>> BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>>> ObjectStore::Transaction*)' thread 7f01488d1700 time 2017-09-05
>>> 17:08:05.837770
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
>>> 9282: FAILED assert(0 == "unexpected error")
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>>> luminous (rc)
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
>>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x110) [0x7f0164fe4510]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
>>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
>>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>>> std::vector<ObjectStore::Transaction,
>>> std::allocator<ObjectStore::Transaction> >&,
>>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>>> [0x7f0164ebc0a0]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
>>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
>>> (ECBackend::handle_sub_write(pg_shard_t,
>>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
>>> Context*)+0x631) [0x7f0164d4d711]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
>>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>>> [0x7f0164d5e2b7]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
>>> (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>> [0x7f0164c62a10]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
>>> (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
>>> (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
>>> [0x7f0164a61729]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>> const&)+0x57) [0x7f0164cd1197]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 11: (OSD::ShardedOpWQ::_process(unsigned int,
>>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>>> [0x7f0164fea029]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 14: (()+0x7dc5) [0x7f0161ba9dc5]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 15: (clone()+0x6d) [0x7f0160c9d76d]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> *** Caught signal (Aborted) **
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in
>>> thread 7f01488d1700 thread_name:tp_osd_tp
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>>> luminous (rc)
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
>>> (()+0xa23b21) [0x7f0164fa5b21]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
>>> (()+0xf370) [0x7f0161bb1370]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
>>> (gsignal()+0x37) [0x7f0160bdb1d7]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
>>> (abort()+0x148) [0x7f0160bdc8c8]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
>>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x284) [0x7f0164fe4684]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
>>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
>>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>>> std::vector<ObjectStore::Transaction,
>>> std::allocator<ObjectStore::Transaction> >&,
>>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>>> [0x7f0164ebc0a0]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
>>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
>>> (ECBackend::handle_sub_write(pg_shard_t,
>>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
>>> Context*)+0x631) [0x7f0164d4d711]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 10:
>>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>>> [0x7f0164d5e2b7]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>> [0x7f0164c62a10]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
>>> [0x7f0164a61729]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>> const&)+0x57) [0x7f0164cd1197]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 15: (OSD::ShardedOpWQ::_process(unsigned int,
>>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>>> [0x7f0164fea029]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 18: (()+0x7dc5) [0x7f0161ba9dc5]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 19: (clone()+0x6d) [0x7f0160c9d76d]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) **
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in
>>> thread 7f01488d1700 thread_name:tp_osd_tp
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>>> luminous (rc)
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
>>> (()+0xa23b21) [0x7f0164fa5b21]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
>>> (()+0xf370) [0x7f0161bb1370]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
>>> (gsignal()+0x37) [0x7f0160bdb1d7]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
>>> (abort()+0x148) [0x7f0160bdc8c8]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
>>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x284) [0x7f0164fe4684]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
>>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
>>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>>> std::vector<ObjectStore::Transaction,
>>> std::allocator<ObjectStore::Transaction> >&,
>>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>>> [0x7f0164ebc0a0]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
>>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
>>> (ECBackend::handle_sub_write(pg_shard_t,
>>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
>>> Context*)+0x631) [0x7f0164d4d711]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 10:
>>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>>> [0x7f0164d5e2b7]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>> [0x7f0164c62a10]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
>>> [0x7f0164a61729]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>> const&)+0x57) [0x7f0164cd1197]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 15: (OSD::ShardedOpWQ::_process(unsigned int,
>>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>>> [0x7f0164fea029]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 18: (()+0x7dc5) [0x7f0161ba9dc5]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 19: (clone()+0x6d) [0x7f0160c9d76d]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 0>
>>> 2017-09-05 17:08:05.883240 7f01488d1700 -1 *** Caught signal (Aborted) **
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: in
>>> thread 7f01488d1700 thread_name:tp_osd_tp
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c)
>>> luminous (rc)
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 1:
>>> (()+0xa23b21) [0x7f0164fa5b21]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 2:
>>> (()+0xf370) [0x7f0161bb1370]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 3:
>>> (gsignal()+0x37) [0x7f0160bdb1d7]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 4:
>>> (abort()+0x148) [0x7f0160bdc8c8]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 5:
>>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x284) [0x7f0164fe4684]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 6:
>>> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
>>> ObjectStore::Transaction*)+0x1487) [0x7f0164ebb057]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 7:
>>> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
>>> std::vector<ObjectStore::Transaction,
>>> std::allocator<ObjectStore::Transaction> >&,
>>> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a0)
>>> [0x7f0164ebc0a0]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 8:
>>> (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>>> >&, boost::intrusive_ptr<OpRequest>)+0x65) [0x7f0164c2d745]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]: 9:
>>> (ECBackend::handle_sub_write(pg_shard_t,
>>> boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&,
>>> Context*)+0x631) [0x7f0164d4d711]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 10:
>>> (ECBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x327)
>>> [0x7f0164d5e2b7]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 11: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50)
>>> [0x7f0164c62a10]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 12: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
>>> ThreadPool::TPHandle&)+0x58e) [0x7f0164bcea7e]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 13: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f9)
>>> [0x7f0164a61729]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 14: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
>>> const&)+0x57) [0x7f0164cd1197]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 15: (OSD::ShardedOpWQ::_process(unsigned int,
>>> ceph::heartbeat_handle_d*)+0xfce) [0x7f0164a8cc8e]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 16: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
>>> [0x7f0164fea029]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 17: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f0164febfc0]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 18: (()+0x7dc5) [0x7f0161ba9dc5]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> 19: (clone()+0x6d) [0x7f0160c9d76d]
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[6441]:
>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>>> ceph-osd@12.service: main process exited, code=killed, status=6/ABRT
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit
>>> ceph-osd@12.service entered failed state.
>>> Sep 05 17:08:05 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>>> ceph-osd@12.service failed.
>>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>>> ceph-osd@12.service holdoff time over, scheduling restart.
>>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: start
>>> request repeated too quickly for ceph-osd@12.service
>>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Failed
>>> to start Ceph object storage daemon osd.12.
>>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: Unit
>>> ceph-osd@12.service entered failed state.
>>> Sep 05 17:08:26 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]:
>>> ceph-osd@12.service failed.
>>>
>>>
>>>
>>> Drives are OK, FS mounted and no suspicious kernel logs.
>>>
>>> What's next to troubleshoot ?
>>>
>> While I am unable to assist you with this problem, I think it would be
>> useful to create tracker.ceph.com issue and attack OSD logs as it dumps
>> recent events too.
>>
>> BTW I have few open tasks too related to OSD stability/reliability on
>> similar setup - http://tracker.ceph.com/issues/21259 and
>> http://tracker.ceph.com/issues/21180
>> Can you grep your logs to see if you experienced something similar?
>>
>> And I got "FAILED assert(i->prior_version == last || i->is_error())",
>> "FAILED assert(p.second->shared_blob_set.empty())" and abort on
>> thread_name:tp_peering today which I didn't get to report yet.
>>
>>
>>>
>>> thanks !
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> --
> Dr. Thomas Coelho
> HRZ Service Center Riedberg
>
> Goethe Universität Frankfurt | Campus Riedberg
> Max-von-Laue Str. 1 | 60438 Frankfurt am Main
> Room 1.128
>
> Phone:  +49-(0)69 798 40026
> E-Mail: coelho@xxxxxxxxxxxxxxxxxxxxxxxxxx
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux