osd crash with object store set to newstore

Srikanth Madugundi <srikanth.madugundi@xxxxxxxxx> · Mon, 1 Jun 2015 18:40:22 -0700

Hi Sage and all,

I build ceph code from wip-newstore on RHEL7 and running performance
tests to compare with filestore. After few hours of running the tests
the osd daemons started to crash. Here is the stack trace, the osd
crashes immediately after the restart. So I could not get the osd up
and running.

ceph version b8e22893f44979613738dfcdd40dada2b513118
(eb8e22893f44979613738dfcdd40dada2b513118)
1: /usr/bin/ceph-osd() [0xb84652]
2: (()+0xf130) [0x7f915f84f130]
3: (gsignal()+0x39) [0x7f915e2695c9]
4: (abort()+0x148) [0x7f915e26acd8]
5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f915eb6d9d5]
6: (()+0x5e946) [0x7f915eb6b946]
7: (()+0x5e973) [0x7f915eb6b973]
8: (()+0x5eb9f) [0x7f915eb6bb9f]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x27a) [0xc84c5a]
10: (NewStore::collection_list_partial(coll_t, ghobject_t, int, int,
snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*,
ghobject_t*)+0x13c9) [0xa08639]
11: (PGBackend::objects_list_partial(hobject_t const&, int, int,
snapid_t, std::vector<hobject_t, std::allocator<hobject_t> >*,
hobject_t*)+0x352) [0x918a02]
12: (ReplicatedPG::do_pg_op(std::tr1::shared_ptr<OpRequest>)+0x1066) [0x8aa906]
13: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>&)+0x1eb) [0x8cd06b]
14: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x68a) [0x85dbea]
15: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ed)
[0x6c3f5d]
16: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x2e9) [0x6c4449]
17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f) [0xc746bf]
18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xc767f0]
19: (()+0x7df3) [0x7f915f847df3]
20: (clone()+0x6d) [0x7f915e32a01d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

Please let me know the cause of this crash, when this crash happens I
noticed that two osds on separate machines are down. I can bring one
osd up but restarting the other osd causes both OSDs to crash. My
understanding is the crash seems to happen when two OSDs try to
communicate and replicate a particular PG.

Regards
Srikanth
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html