It's probably not the same issue as that ticket, which was about the OSD handling a lack of output incorrectly. (It might be handling the output incorrectly in some other way, but hopefully not...) Have you run this crush map through any test mappings yet? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sun, Jul 14, 2013 at 10:59 PM, Vladislav Gorbunov <vadikgo@xxxxxxxxx> wrote: > Sympthoms like on http://tracker.ceph.com/issues/4699 > > all OSDs the process ceph-osd crash with segfault > > If i stop MONs daemons then i can start OSDs but if i start MONs back > then die all OSDs again. > > more detailed log: > 0> 2013-07-15 16:42:05.001242 7ffe5a6fc700 -1 *** Caught signal > (Segmentation fault) ** > in thread 7ffe5a6fc700 > > ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404) > 1: /usr/bin/ceph-osd() [0x790e5a] > 2: (()+0xfcb0) [0x7ffe6b729cb0] > 3: /usr/bin/ceph-osd() [0x893879] > 4: (crush_do_rule()+0x1e5) [0x894065] > 5: (CrushWrapper::do_rule(int, int, std::vector<int, > std::allocator<int> >&, int, std::vector<unsigned int, > std::allocator<unsigned int> > const&) const+0x7a) [0x81b2ba] > 6: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int, > std::allocator<int> >&) const+0x8f) [0x80d7cf] > 7: (OSDMap::pg_to_up_acting_osds(pg_t, std::vector<int, > std::allocator<int> >&, std::vector<int, std::allocator<int> >&) > const+0xa6) [0x80d8a6] > 8: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, > PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, > std::less<boost::intrusive_ptr<PG> >, > std::allocator<boost::intrusive_ptr<PG> > >*)+0x190) [0x631b70] > 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > > const&, ThreadPool::TPHandle&)+0x244) [0x632254] > 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > > const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2] > 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66] > 12: (ThreadPool::WorkThread::entry()+0x10) [0x837890] > 13: (()+0x7e9a) [0x7ffe6b721e9a] > 14: (clone()+0x6d) [0x7ffe69f9fccd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 0/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/ 5 hadoop > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.2.log > --- end dump of recent events --- > > 2013/7/14 Vladislav Gorbunov <vadikgo@xxxxxxxxx>: >> Hello! >> >> After change the crush map all osd (ceph version 0.61.4 >> (1669132fcfc27d0c0b5e5bb93ade59d147e23404)) on pool default is crushed >> with the error: >> 2013-07-14 17:26:23.755432 7f0c963ad700 -1 *** Caught signal >> (Segmentation fault) ** >> in thread 7f0c963ad700 >> ...skipping... >> 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > >> const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2] >> 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66] >> 12: (ThreadPool::WorkThread::entry()+0x10) [0x837890] >> 13: (()+0x7e9a) [0x7fac4e597e9a] >> 14: (clone()+0x6d) [0x7fac4ce15ccd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> --- logging levels --- >> 0/ 5 none >> 0/ 1 lockdep >> 0/ 1 context >> 1/ 1 crush >> 1/ 5 mds >> 1/ 5 mds_balancer >> 1/ 5 mds_locker >> 1/ 5 mds_log >> 1/ 5 mds_log_expire >> 1/ 5 mds_migrator >> 0/ 1 buffer >> 0/ 1 timer >> 0/ 1 filer >> 0/ 1 striper >> 0/ 1 objecter >> 0/ 5 rados >> 0/ 5 rbd >> 0/ 5 journaler >> 0/ 5 objectcacher >> 0/ 5 client >> 0/ 5 osd >> 0/ 5 optracker >> 0/ 5 objclass >> 1/ 3 filestore >> 1/ 3 journal >> 0/ 5 ms >> 1/ 5 mon >> 0/10 monc >> 0/ 5 paxos >> 0/ 5 tp >> 1/ 5 auth >> 1/ 5 crypto >> 1/ 1 finisher >> 1/ 5 heartbeatmap >> 1/ 5 perfcounter >> 1/ 5 rgw >> 1/ 5 hadoop >> 1/ 5 javaclient >> 1/ 5 asok >> 1/ 1 throttle >> -2/-2 (syslog threshold) >> -1/-1 (stderr threshold) >> max_recent 10000 >> max_new 1000 >> log_file /var/log/ceph/ceph-osd.2.log >> --- end dump of recent events --- >> >> ceph osd complity ignoring osd start/stop an always show this map: >> # id weight type name up/down reweight >> -2 65.52 pool iscsi >> -4 32.76 datacenter datacenter-cod >> -6 32.76 host gstore1 >> 82 2.73 osd.82 up 1 >> 83 2.73 osd.83 up 1 >> 84 2.73 osd.84 up 1 >> 85 2.73 osd.85 up 1 >> 86 2.73 osd.86 up 1 >> 87 2.73 osd.87 up 1 >> 88 2.73 osd.88 up 1 >> 89 2.73 osd.89 up 1 >> 90 2.73 osd.90 up 1 >> 91 2.73 osd.91 up 1 >> 92 2.73 osd.92 up 1 >> 93 2.73 osd.93 up 1 >> -5 32.76 datacenter datacenter-rcod >> -7 32.76 host gstore2 >> 94 2.73 osd.94 up 1 >> 95 2.73 osd.95 up 1 >> 96 2.73 osd.96 up 1 >> 97 2.73 osd.97 up 1 >> 98 2.73 osd.98 up 1 >> 99 2.73 osd.99 up 1 >> 100 2.73 osd.100 up 1 >> 101 2.73 osd.101 up 1 >> 102 2.73 osd.102 up 1 >> 103 2.73 osd.103 up 1 >> 104 2.73 osd.104 up 1 >> 105 2.73 osd.105 up 1 >> -1 68.96 pool default >> -3 68.96 rack unknownrack >> -9 5 host gstore5 >> 2 5 osd.2 down 1 >> -10 16 host cstore1 >> 5 1 osd.5 down 0 >> 6 1 osd.6 down 0 >> 7 1 osd.7 down 0 >> 8 1 osd.8 down 0 >> 9 1 osd.9 down 0 >> 10 1 osd.10 down 0 >> 11 1 osd.11 down 0 >> 12 1 osd.12 down 0 >> 13 1 osd.13 down 0 >> 14 1 osd.14 down 0 >> 4 1 osd.4 down 0 >> 47 1 osd.47 down 0 >> 48 1 osd.48 down 0 >> 49 1 osd.49 down 0 >> 50 1 osd.50 down 0 >> 51 1 osd.51 down 0 >> -11 21 host cstore2 >> 15 1 osd.15 down 0 >> 16 1 osd.16 down 0 >> 17 1 osd.17 down 0 >> 18 1 osd.18 down 0 >> 19 1 osd.19 down 0 >> 20 1 osd.20 down 0 >> 21 1 osd.21 down 0 >> 22 1 osd.22 down 0 >> 23 1 osd.23 down 0 >> 24 1 osd.24 down 0 >> 41 1 osd.41 down 0 >> 42 1 osd.42 down 0 >> 43 1 osd.43 down 0 >> 44 1 osd.44 down 0 >> 45 1 osd.45 down 0 >> 46 1 osd.46 down 0 >> 52 1 osd.52 down 0 >> 53 1 osd.53 down 0 >> 54 1 osd.54 down 0 >> 55 1 osd.55 down 0 >> 56 1 osd.56 up 1 >> 57 0 osd.57 up 1 >> -12 16 host cstore3 >> 25 1 osd.25 down 0 >> 26 1 osd.26 down 0 >> 27 1 osd.27 down 0 >> 28 1 osd.28 down 0 >> 29 1 osd.29 down 0 >> 30 1 osd.30 down 0 >> 31 1 osd.31 down 0 >> 32 1 osd.32 down 0 >> 33 1 osd.33 down 0 >> 34 1 osd.34 down 0 >> 35 1 osd.35 down 0 >> 36 1 osd.36 down 0 >> 37 1 osd.37 down 0 >> 38 1 osd.38 down 0 >> 39 1 osd.39 down 0 >> 40 1 osd.40 down 0 >> -13 7.64 host cstore4 >> 62 0.55 osd.62 down 0 >> 63 0.55 osd.63 down 0 >> 64 0.55 osd.64 down 0 >> 65 0.55 osd.65 down 0 >> 66 0.55 osd.66 down 0 >> 67 0.55 osd.67 down 0 >> 68 0.55 osd.68 down 0 >> 69 0.55 osd.69 down 0 >> 70 0.27 osd.70 down 0 >> 71 0.27 osd.71 down 0 >> 72 0.27 osd.72 down 0 >> 73 0.27 osd.73 down 0 >> 74 0.27 osd.74 down 0 >> 75 0.27 osd.75 down 0 >> 76 0.27 osd.76 down 0 >> 77 0.27 osd.77 down 0 >> 78 0.27 osd.78 down 0 >> 79 0.27 osd.79 down 0 >> 80 0.27 osd.80 down 0 >> 81 0.27 osd.81 down 0 >> -14 3.32 host cstore5 >> 0 0.4 osd.0 down 0 >> 1 0.5 osd.1 down 0 >> 3 0.4 osd.3 down 0 >> 58 0.4 osd.58 down 1 >> 59 0.54 osd.59 down 1 >> 60 0.54 osd.60 down 1 >> 61 0.54 osd.61 down 1 >> >> crush map is: >> # begin crush map >> >> # devices >> device 0 osd.0 >> device 1 osd.1 >> device 2 osd.2 >> device 3 osd.3 >> device 4 osd.4 >> device 5 osd.5 >> device 6 osd.6 >> device 7 osd.7 >> device 8 osd.8 >> device 9 osd.9 >> device 10 osd.10 >> device 11 osd.11 >> device 12 osd.12 >> device 13 osd.13 >> device 14 osd.14 >> device 15 osd.15 >> device 16 osd.16 >> device 17 osd.17 >> device 18 osd.18 >> device 19 osd.19 >> device 20 osd.20 >> device 21 osd.21 >> device 22 osd.22 >> device 23 osd.23 >> device 24 osd.24 >> device 25 osd.25 >> device 26 osd.26 >> device 27 osd.27 >> device 28 osd.28 >> device 29 osd.29 >> device 30 osd.30 >> device 31 osd.31 >> device 32 osd.32 >> device 33 osd.33 >> device 34 osd.34 >> device 35 osd.35 >> device 36 osd.36 >> device 37 osd.37 >> device 38 osd.38 >> device 39 osd.39 >> device 40 osd.40 >> device 41 osd.41 >> device 42 osd.42 >> device 43 osd.43 >> device 44 osd.44 >> device 45 osd.45 >> device 46 osd.46 >> device 47 osd.47 >> device 48 osd.48 >> device 49 osd.49 >> device 50 osd.50 >> device 51 osd.51 >> device 52 osd.52 >> device 53 osd.53 >> device 54 osd.54 >> device 55 osd.55 >> device 56 osd.56 >> device 57 osd.57 >> device 58 osd.58 >> device 59 osd.59 >> device 60 osd.60 >> device 61 osd.61 >> device 62 osd.62 >> device 63 osd.63 >> device 64 osd.64 >> device 65 osd.65 >> device 66 osd.66 >> device 67 osd.67 >> device 68 osd.68 >> device 69 osd.69 >> device 70 osd.70 >> device 71 osd.71 >> device 72 osd.72 >> device 73 osd.73 >> device 74 osd.74 >> device 75 osd.75 >> device 76 osd.76 >> device 77 osd.77 >> device 78 osd.78 >> device 79 osd.79 >> device 80 osd.80 >> device 81 osd.81 >> device 82 osd.82 >> device 83 osd.83 >> device 84 osd.84 >> device 85 osd.85 >> device 86 osd.86 >> device 87 osd.87 >> device 88 osd.88 >> device 89 osd.89 >> device 90 osd.90 >> device 91 osd.91 >> device 92 osd.92 >> device 93 osd.93 >> device 94 osd.94 >> device 95 osd.95 >> device 96 osd.96 >> device 97 osd.97 >> device 98 osd.98 >> device 99 osd.99 >> device 100 osd.100 >> device 101 osd.101 >> device 102 osd.102 >> device 103 osd.103 >> device 104 osd.104 >> device 105 osd.105 >> >> # types >> type 0 osd >> type 1 host >> type 2 rack >> type 3 row >> type 4 room >> type 5 datacenter >> type 6 pool >> >> # buckets >> host gstore5 { >> id -9 # do not change unnecessarily >> # weight 5.000 >> alg straw >> hash 0 # rjenkins1 >> item osd.2 weight 5.000 >> } >> host cstore1 { >> id -10 # do not change unnecessarily >> # weight 16.000 >> alg straw >> hash 0 # rjenkins1 >> item osd.5 weight 1.000 >> item osd.6 weight 1.000 >> item osd.7 weight 1.000 >> item osd.8 weight 1.000 >> item osd.9 weight 1.000 >> item osd.10 weight 1.000 >> item osd.11 weight 1.000 >> item osd.12 weight 1.000 >> item osd.13 weight 1.000 >> item osd.14 weight 1.000 >> item osd.4 weight 1.000 >> item osd.47 weight 1.000 >> item osd.48 weight 1.000 >> item osd.49 weight 1.000 >> item osd.50 weight 1.000 >> item osd.51 weight 1.000 >> } >> host cstore2 { >> id -11 # do not change unnecessarily >> # weight 20.000 >> alg straw >> hash 0 # rjenkins1 >> item osd.15 weight 1.000 >> item osd.16 weight 1.000 >> item osd.17 weight 1.000 >> item osd.18 weight 1.000 >> item osd.19 weight 1.000 >> item osd.20 weight 1.000 >> item osd.21 weight 1.000 >> item osd.22 weight 1.000 >> item osd.23 weight 1.000 >> item osd.24 weight 1.000 >> item osd.41 weight 1.000 >> item osd.42 weight 1.000 >> item osd.43 weight 1.000 >> item osd.44 weight 1.000 >> item osd.45 weight 1.000 >> item osd.46 weight 1.000 >> item osd.52 weight 1.000 >> item osd.53 weight 1.000 >> item osd.54 weight 1.000 >> item osd.55 weight 1.000 >> item osd.56 weight 0.000 >> item osd.57 weight 0.000 >> } >> host cstore3 { >> id -12 # do not change unnecessarily >> # weight 16.000 >> alg straw >> hash 0 # rjenkins1 >> item osd.25 weight 1.000 >> item osd.26 weight 1.000 >> item osd.27 weight 1.000 >> item osd.28 weight 1.000 >> item osd.29 weight 1.000 >> item osd.30 weight 1.000 >> item osd.31 weight 1.000 >> item osd.32 weight 1.000 >> item osd.33 weight 1.000 >> item osd.34 weight 1.000 >> item osd.35 weight 1.000 >> item osd.36 weight 1.000 >> item osd.37 weight 1.000 >> item osd.38 weight 1.000 >> item osd.39 weight 1.000 >> item osd.40 weight 1.000 >> } >> host cstore4 { >> id -13 # do not change unnecessarily >> # weight 7.640 >> alg straw >> hash 0 # rjenkins1 >> item osd.62 weight 0.550 >> item osd.63 weight 0.550 >> item osd.64 weight 0.550 >> item osd.65 weight 0.550 >> item osd.66 weight 0.550 >> item osd.67 weight 0.550 >> item osd.68 weight 0.550 >> item osd.69 weight 0.550 >> item osd.70 weight 0.270 >> item osd.71 weight 0.270 >> item osd.72 weight 0.270 >> item osd.73 weight 0.270 >> item osd.74 weight 0.270 >> item osd.75 weight 0.270 >> item osd.76 weight 0.270 >> item osd.77 weight 0.270 >> item osd.78 weight 0.270 >> item osd.79 weight 0.270 >> item osd.80 weight 0.270 >> item osd.81 weight 0.270 >> } >> host cstore5 { >> id -14 # do not change unnecessarily >> # weight 3.320 >> alg straw >> hash 0 # rjenkins1 >> item osd.0 weight 0.400 >> item osd.1 weight 0.500 >> item osd.3 weight 0.400 >> item osd.58 weight 0.400 >> item osd.59 weight 0.540 >> item osd.60 weight 0.540 >> item osd.61 weight 0.540 >> } >> rack unknownrack { >> id -3 # do not change unnecessarily >> # weight 67.960 >> alg straw >> hash 0 # rjenkins1 >> item gstore5 weight 5.000 >> item cstore1 weight 16.000 >> item cstore2 weight 20.000 >> item cstore3 weight 16.000 >> item cstore4 weight 7.640 >> item cstore5 weight 3.320 >> } >> pool default { >> id -1 # do not change unnecessarily >> # weight 67.960 >> alg straw >> hash 0 # rjenkins1 >> item unknownrack weight 67.960 >> } >> host gstore1 { >> id -6 # do not change unnecessarily >> # weight 32.760 >> alg straw >> hash 0 # rjenkins1 >> item osd.82 weight 2.730 >> item osd.83 weight 2.730 >> item osd.84 weight 2.730 >> item osd.85 weight 2.730 >> item osd.86 weight 2.730 >> item osd.87 weight 2.730 >> item osd.88 weight 2.730 >> item osd.89 weight 2.730 >> item osd.90 weight 2.730 >> item osd.91 weight 2.730 >> item osd.92 weight 2.730 >> item osd.93 weight 2.730 >> } >> datacenter datacenter-cod { >> id -4 # do not change unnecessarily >> # weight 32.760 >> alg straw >> hash 0 # rjenkins1 >> item gstore1 weight 32.760 >> } >> host gstore2 { >> id -7 # do not change unnecessarily >> # weight 32.760 >> alg straw >> hash 0 # rjenkins1 >> item osd.94 weight 2.730 >> item osd.95 weight 2.730 >> item osd.96 weight 2.730 >> item osd.97 weight 2.730 >> item osd.98 weight 2.730 >> item osd.99 weight 2.730 >> item osd.100 weight 2.730 >> item osd.101 weight 2.730 >> item osd.102 weight 2.730 >> item osd.103 weight 2.730 >> item osd.104 weight 2.730 >> item osd.105 weight 2.730 >> } >> datacenter datacenter-rcod { >> id -5 # do not change unnecessarily >> # weight 32.760 >> alg straw >> hash 0 # rjenkins1 >> item gstore2 weight 32.760 >> } >> pool iscsi { >> id -2 # do not change unnecessarily >> # weight 65.520 >> alg straw >> hash 0 # rjenkins1 >> item datacenter-cod weight 32.760 >> item datacenter-rcod weight 32.760 >> } >> >> # rules >> rule data { >> ruleset 0 >> type replicated >> min_size 1 >> max_size 10 >> step take default >> step chooseleaf firstn 0 type host >> step emit >> } >> rule metadata { >> ruleset 1 >> type replicated >> min_size 1 >> max_size 10 >> step take default >> step chooseleaf firstn 0 type host >> step emit >> } >> rule rbd { >> ruleset 2 >> type replicated >> min_size 1 >> max_size 10 >> step take default >> step chooseleaf firstn 0 type host >> step emit >> } >> >> # end crush map >> >> >> Restarting mon don't help. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com