Sadly, the update to 0.94.6 did not solve the issue. I still can't get one of my OSD to run at all. I have included the crash report below.
It looks like the following assert fails:https://github.com/ceph/ceph/blob/v0.94.6/src/osd/ReplicatedPG.cc line 10495
ObjectContextRef obc = get_object_context(oid, false);
assert(obc);
I already tried rebuilding the OS and updated ceph on that node, so there must be some problem with the OSD data itself.
(0.94.6 failure stack trace follows)
root@node8:~# /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph -f
starting osd.2 at :/0 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2016-04-23 11:47:14.294809 b68c7000 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2016-04-23 11:47:22.388962 b68c7000 -1 osd.2 338888 log_to_monitors {default=true}
osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 9d0bb350 time 2016-04-23 11:47:26.570268
osd/ReplicatedPG.cc: 10495: FAILED assert(obc)
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x70) [0x7774c8]
2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0xa48) [0x44f8f0]
3: (ReplicatedPG::hit_set_persist()+0xf18) [0x450a78]
4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xd9c) [0x45c788]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x554) [0x3fa028]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a8) [0x2707a0]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x428) [0x270e70]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x740) [0x7696a0]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x76c1f4]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-04-23 11:47:26.585694 9d0bb350 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 9d0bb350 time 2016-04-23 11:47:26.570268
osd/ReplicatedPG.cc: 10495: FAILED assert(obc)
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x70) [0x7774c8]
2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0xa48) [0x44f8f0]
3: (ReplicatedPG::hit_set_persist()+0xf18) [0x450a78]
4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xd9c) [0x45c788]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x554) [0x3fa028]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a8) [0x2707a0]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x428) [0x270e70]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x740) [0x7696a0]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x76c1f4]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-3175> 2016-04-23 11:47:14.294809 b68c7000 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
-2425> 2016-04-23 11:47:22.388962 b68c7000 -1 osd.2 338888 log_to_monitors {default=true}
0> 2016-04-23 11:47:26.585694 9d0bb350 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 9d0bb350 time 2016-04-23 11:47:26.570268
osd/ReplicatedPG.cc: 10495: FAILED assert(obc)
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x70) [0x7774c8]
2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0xa48) [0x44f8f0]
3: (ReplicatedPG::hit_set_persist()+0xf18) [0x450a78]
4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xd9c) [0x45c788]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x554) [0x3fa028]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a8) [0x2707a0]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x428) [0x270e70]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x740) [0x7696a0]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x76c1f4]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (Aborted) **
in thread 9d0bb350
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0x69764c]
2: (__default_sa_restorer()+0) [0xb694ed10]
3: (gsignal()+0x38) [0xb694daa8]
2016-04-23 11:47:26.856125 9d0bb350 -1 *** Caught signal (Aborted) **
in thread 9d0bb350
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0x69764c]
2: (__default_sa_restorer()+0) [0xb694ed10]
3: (gsignal()+0x38) [0xb694daa8]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2016-04-23 11:47:26.856125 9d0bb350 -1 *** Caught signal (Aborted) **
in thread 9d0bb350
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0x69764c]
2: (__default_sa_restorer()+0) [0xb694ed10]
3: (gsignal()+0x38) [0xb694daa8]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aborted
root@node8:~# /usr/bin/ceph-osd -i 2 --pid-file /var/run/ceph/osd.2.pid -c /etc/ceph/ceph.conf --cluster ceph -f
starting osd.2 at :/0 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
2016-04-23 11:47:14.294809 b68c7000 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
2016-04-23 11:47:22.388962 b68c7000 -1 osd.2 338888 log_to_monitors {default=true}
osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 9d0bb350 time 2016-04-23 11:47:26.570268
osd/ReplicatedPG.cc: 10495: FAILED assert(obc)
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x70) [0x7774c8]
2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0xa48) [0x44f8f0]
3: (ReplicatedPG::hit_set_persist()+0xf18) [0x450a78]
4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xd9c) [0x45c788]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x554) [0x3fa028]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a8) [0x2707a0]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x428) [0x270e70]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x740) [0x7696a0]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x76c1f4]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-04-23 11:47:26.585694 9d0bb350 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 9d0bb350 time 2016-04-23 11:47:26.570268
osd/ReplicatedPG.cc: 10495: FAILED assert(obc)
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x70) [0x7774c8]
2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0xa48) [0x44f8f0]
3: (ReplicatedPG::hit_set_persist()+0xf18) [0x450a78]
4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xd9c) [0x45c788]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x554) [0x3fa028]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a8) [0x2707a0]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x428) [0x270e70]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x740) [0x7696a0]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x76c1f4]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
-3175> 2016-04-23 11:47:14.294809 b68c7000 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
-2425> 2016-04-23 11:47:22.388962 b68c7000 -1 osd.2 338888 log_to_monitors {default=true}
0> 2016-04-23 11:47:26.585694 9d0bb350 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 9d0bb350 time 2016-04-23 11:47:26.570268
osd/ReplicatedPG.cc: 10495: FAILED assert(obc)
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x70) [0x7774c8]
2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0xa48) [0x44f8f0]
3: (ReplicatedPG::hit_set_persist()+0xf18) [0x450a78]
4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0xd9c) [0x45c788]
5: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x554) [0x3fa028]
6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a8) [0x2707a0]
7: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x428) [0x270e70]
8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x740) [0x7696a0]
9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x76c1f4]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (Aborted) **
in thread 9d0bb350
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0x69764c]
2: (__default_sa_restorer()+0) [0xb694ed10]
3: (gsignal()+0x38) [0xb694daa8]
2016-04-23 11:47:26.856125 9d0bb350 -1 *** Caught signal (Aborted) **
in thread 9d0bb350
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0x69764c]
2: (__default_sa_restorer()+0) [0xb694ed10]
3: (gsignal()+0x38) [0xb694daa8]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
0> 2016-04-23 11:47:26.856125 9d0bb350 -1 *** Caught signal (Aborted) **
in thread 9d0bb350
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0x69764c]
2: (__default_sa_restorer()+0) [0xb694ed10]
3: (gsignal()+0x38) [0xb694daa8]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Aborted
On Thu, Apr 21, 2016 at 12:40 AM, Blade Doyle <blade.doyle@xxxxxxxxx> wrote:
Blade.That was a poor example, because it was an older version of ceph and the clock was not set correctly. But I don't think either of those things causes the problem because I see it on multiple nodes:
root@node8:/var/log/ceph# grep hit_set_trim ceph-osd.2.log | wc -l
2524
root@node8:/var/log/ceph# ceph --version
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
root@node8:/var/log/ceph# date
Wed 20 Apr 23:58:59 PDT 2016I saw this: http://tracker.ceph.com/issues/9732 so I set all the timezones. Didn't solve the problem. I am building 0.94.6 anyway.Thanks,On Wed, Apr 20, 2016 at 12:37 AM, Blade Doyle <blade.doyle@xxxxxxxxx> wrote:I get a lot of osd crash with the following stack - suggestion please:Blade.
0> 1969-12-31 16:04:55.455688 83ccf410 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)' thread 83ccf410 time 295.324905
osd/ReplicatedPG.cc: 11011: FAILED assert(obc)
ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
1: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)+0x3f9) [0xb6c625e6]
2: (ReplicatedPG::hit_set_persist()+0x8bf) [0xb6c62fb4]
3: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0xc97) [0xb6c6eb2c]
4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x439) [0xb6c2f01a]
5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x22b) [0xb6b0b984]
6: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, ThreadPool::TPHandle&)+0x13d) [0xb6b1ccf6]
7: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >::_void_process(void*, ThreadPool::TPHandle&)+0x6b) [0xb6b4692c]
8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb93) [0xb6e152bc]
9: (ThreadPool::WorkThread::entry()+0x9) [0xb6e15aea]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com