Sympthoms like on http://tracker.ceph.com/issues/4699 all OSDs the process ceph-osd crash with segfault If i stop MONs daemons then i can start OSDs but if i start MONs back then die all OSDs again. more detailed log: 0> 2013-07-15 16:42:05.001242 7ffe5a6fc700 -1 *** Caught signal (Segmentation fault) ** in thread 7ffe5a6fc700 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404) 1: /usr/bin/ceph-osd() [0x790e5a] 2: (()+0xfcb0) [0x7ffe6b729cb0] 3: /usr/bin/ceph-osd() [0x893879] 4: (crush_do_rule()+0x1e5) [0x894065] 5: (CrushWrapper::do_rule(int, int, std::vector<int, std::allocator<int> >&, int, std::vector<unsigned int, std::allocator<unsigned int> > const&) const+0x7a) [0x81b2ba] 6: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int, std::allocator<int> >&) const+0x8f) [0x80d7cf] 7: (OSDMap::pg_to_up_acting_osds(pg_t, std::vector<int, std::allocator<int> >&, std::vector<int, std::allocator<int> >&) const+0xa6) [0x80d8a6] 8: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&, PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>, std::less<boost::intrusive_ptr<PG> >, std::allocator<boost::intrusive_ptr<PG> > >*)+0x190) [0x631b70] 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x244) [0x632254] 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2] 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66] 12: (ThreadPool::WorkThread::entry()+0x10) [0x837890] 13: (()+0x7e9a) [0x7ffe6b721e9a] 14: (clone()+0x6d) [0x7ffe69f9fccd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 0/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.2.log --- end dump of recent events --- 2013/7/14 Vladislav Gorbunov <vadikgo@xxxxxxxxx>: > Hello! > > After change the crush map all osd (ceph version 0.61.4 > (1669132fcfc27d0c0b5e5bb93ade59d147e23404)) on pool default is crushed > with the error: > 2013-07-14 17:26:23.755432 7f0c963ad700 -1 *** Caught signal > (Segmentation fault) ** > in thread 7f0c963ad700 > ...skipping... > 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> > > const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2] > 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66] > 12: (ThreadPool::WorkThread::entry()+0x10) [0x837890] > 13: (()+0x7e9a) [0x7fac4e597e9a] > 14: (clone()+0x6d) [0x7fac4ce15ccd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 1 lockdep > 0/ 1 context > 1/ 1 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 1 buffer > 0/ 1 timer > 0/ 1 filer > 0/ 1 striper > 0/ 1 objecter > 0/ 5 rados > 0/ 5 rbd > 0/ 5 journaler > 0/ 5 objectcacher > 0/ 5 client > 0/ 5 osd > 0/ 5 optracker > 0/ 5 objclass > 1/ 3 filestore > 1/ 3 journal > 0/ 5 ms > 1/ 5 mon > 0/10 monc > 0/ 5 paxos > 0/ 5 tp > 1/ 5 auth > 1/ 5 crypto > 1/ 1 finisher > 1/ 5 heartbeatmap > 1/ 5 perfcounter > 1/ 5 rgw > 1/ 5 hadoop > 1/ 5 javaclient > 1/ 5 asok > 1/ 1 throttle > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.2.log > --- end dump of recent events --- > > ceph osd complity ignoring osd start/stop an always show this map: > # id weight type name up/down reweight > -2 65.52 pool iscsi > -4 32.76 datacenter datacenter-cod > -6 32.76 host gstore1 > 82 2.73 osd.82 up 1 > 83 2.73 osd.83 up 1 > 84 2.73 osd.84 up 1 > 85 2.73 osd.85 up 1 > 86 2.73 osd.86 up 1 > 87 2.73 osd.87 up 1 > 88 2.73 osd.88 up 1 > 89 2.73 osd.89 up 1 > 90 2.73 osd.90 up 1 > 91 2.73 osd.91 up 1 > 92 2.73 osd.92 up 1 > 93 2.73 osd.93 up 1 > -5 32.76 datacenter datacenter-rcod > -7 32.76 host gstore2 > 94 2.73 osd.94 up 1 > 95 2.73 osd.95 up 1 > 96 2.73 osd.96 up 1 > 97 2.73 osd.97 up 1 > 98 2.73 osd.98 up 1 > 99 2.73 osd.99 up 1 > 100 2.73 osd.100 up 1 > 101 2.73 osd.101 up 1 > 102 2.73 osd.102 up 1 > 103 2.73 osd.103 up 1 > 104 2.73 osd.104 up 1 > 105 2.73 osd.105 up 1 > -1 68.96 pool default > -3 68.96 rack unknownrack > -9 5 host gstore5 > 2 5 osd.2 down 1 > -10 16 host cstore1 > 5 1 osd.5 down 0 > 6 1 osd.6 down 0 > 7 1 osd.7 down 0 > 8 1 osd.8 down 0 > 9 1 osd.9 down 0 > 10 1 osd.10 down 0 > 11 1 osd.11 down 0 > 12 1 osd.12 down 0 > 13 1 osd.13 down 0 > 14 1 osd.14 down 0 > 4 1 osd.4 down 0 > 47 1 osd.47 down 0 > 48 1 osd.48 down 0 > 49 1 osd.49 down 0 > 50 1 osd.50 down 0 > 51 1 osd.51 down 0 > -11 21 host cstore2 > 15 1 osd.15 down 0 > 16 1 osd.16 down 0 > 17 1 osd.17 down 0 > 18 1 osd.18 down 0 > 19 1 osd.19 down 0 > 20 1 osd.20 down 0 > 21 1 osd.21 down 0 > 22 1 osd.22 down 0 > 23 1 osd.23 down 0 > 24 1 osd.24 down 0 > 41 1 osd.41 down 0 > 42 1 osd.42 down 0 > 43 1 osd.43 down 0 > 44 1 osd.44 down 0 > 45 1 osd.45 down 0 > 46 1 osd.46 down 0 > 52 1 osd.52 down 0 > 53 1 osd.53 down 0 > 54 1 osd.54 down 0 > 55 1 osd.55 down 0 > 56 1 osd.56 up 1 > 57 0 osd.57 up 1 > -12 16 host cstore3 > 25 1 osd.25 down 0 > 26 1 osd.26 down 0 > 27 1 osd.27 down 0 > 28 1 osd.28 down 0 > 29 1 osd.29 down 0 > 30 1 osd.30 down 0 > 31 1 osd.31 down 0 > 32 1 osd.32 down 0 > 33 1 osd.33 down 0 > 34 1 osd.34 down 0 > 35 1 osd.35 down 0 > 36 1 osd.36 down 0 > 37 1 osd.37 down 0 > 38 1 osd.38 down 0 > 39 1 osd.39 down 0 > 40 1 osd.40 down 0 > -13 7.64 host cstore4 > 62 0.55 osd.62 down 0 > 63 0.55 osd.63 down 0 > 64 0.55 osd.64 down 0 > 65 0.55 osd.65 down 0 > 66 0.55 osd.66 down 0 > 67 0.55 osd.67 down 0 > 68 0.55 osd.68 down 0 > 69 0.55 osd.69 down 0 > 70 0.27 osd.70 down 0 > 71 0.27 osd.71 down 0 > 72 0.27 osd.72 down 0 > 73 0.27 osd.73 down 0 > 74 0.27 osd.74 down 0 > 75 0.27 osd.75 down 0 > 76 0.27 osd.76 down 0 > 77 0.27 osd.77 down 0 > 78 0.27 osd.78 down 0 > 79 0.27 osd.79 down 0 > 80 0.27 osd.80 down 0 > 81 0.27 osd.81 down 0 > -14 3.32 host cstore5 > 0 0.4 osd.0 down 0 > 1 0.5 osd.1 down 0 > 3 0.4 osd.3 down 0 > 58 0.4 osd.58 down 1 > 59 0.54 osd.59 down 1 > 60 0.54 osd.60 down 1 > 61 0.54 osd.61 down 1 > > crush map is: > # begin crush map > > # devices > device 0 osd.0 > device 1 osd.1 > device 2 osd.2 > device 3 osd.3 > device 4 osd.4 > device 5 osd.5 > device 6 osd.6 > device 7 osd.7 > device 8 osd.8 > device 9 osd.9 > device 10 osd.10 > device 11 osd.11 > device 12 osd.12 > device 13 osd.13 > device 14 osd.14 > device 15 osd.15 > device 16 osd.16 > device 17 osd.17 > device 18 osd.18 > device 19 osd.19 > device 20 osd.20 > device 21 osd.21 > device 22 osd.22 > device 23 osd.23 > device 24 osd.24 > device 25 osd.25 > device 26 osd.26 > device 27 osd.27 > device 28 osd.28 > device 29 osd.29 > device 30 osd.30 > device 31 osd.31 > device 32 osd.32 > device 33 osd.33 > device 34 osd.34 > device 35 osd.35 > device 36 osd.36 > device 37 osd.37 > device 38 osd.38 > device 39 osd.39 > device 40 osd.40 > device 41 osd.41 > device 42 osd.42 > device 43 osd.43 > device 44 osd.44 > device 45 osd.45 > device 46 osd.46 > device 47 osd.47 > device 48 osd.48 > device 49 osd.49 > device 50 osd.50 > device 51 osd.51 > device 52 osd.52 > device 53 osd.53 > device 54 osd.54 > device 55 osd.55 > device 56 osd.56 > device 57 osd.57 > device 58 osd.58 > device 59 osd.59 > device 60 osd.60 > device 61 osd.61 > device 62 osd.62 > device 63 osd.63 > device 64 osd.64 > device 65 osd.65 > device 66 osd.66 > device 67 osd.67 > device 68 osd.68 > device 69 osd.69 > device 70 osd.70 > device 71 osd.71 > device 72 osd.72 > device 73 osd.73 > device 74 osd.74 > device 75 osd.75 > device 76 osd.76 > device 77 osd.77 > device 78 osd.78 > device 79 osd.79 > device 80 osd.80 > device 81 osd.81 > device 82 osd.82 > device 83 osd.83 > device 84 osd.84 > device 85 osd.85 > device 86 osd.86 > device 87 osd.87 > device 88 osd.88 > device 89 osd.89 > device 90 osd.90 > device 91 osd.91 > device 92 osd.92 > device 93 osd.93 > device 94 osd.94 > device 95 osd.95 > device 96 osd.96 > device 97 osd.97 > device 98 osd.98 > device 99 osd.99 > device 100 osd.100 > device 101 osd.101 > device 102 osd.102 > device 103 osd.103 > device 104 osd.104 > device 105 osd.105 > > # types > type 0 osd > type 1 host > type 2 rack > type 3 row > type 4 room > type 5 datacenter > type 6 pool > > # buckets > host gstore5 { > id -9 # do not change unnecessarily > # weight 5.000 > alg straw > hash 0 # rjenkins1 > item osd.2 weight 5.000 > } > host cstore1 { > id -10 # do not change unnecessarily > # weight 16.000 > alg straw > hash 0 # rjenkins1 > item osd.5 weight 1.000 > item osd.6 weight 1.000 > item osd.7 weight 1.000 > item osd.8 weight 1.000 > item osd.9 weight 1.000 > item osd.10 weight 1.000 > item osd.11 weight 1.000 > item osd.12 weight 1.000 > item osd.13 weight 1.000 > item osd.14 weight 1.000 > item osd.4 weight 1.000 > item osd.47 weight 1.000 > item osd.48 weight 1.000 > item osd.49 weight 1.000 > item osd.50 weight 1.000 > item osd.51 weight 1.000 > } > host cstore2 { > id -11 # do not change unnecessarily > # weight 20.000 > alg straw > hash 0 # rjenkins1 > item osd.15 weight 1.000 > item osd.16 weight 1.000 > item osd.17 weight 1.000 > item osd.18 weight 1.000 > item osd.19 weight 1.000 > item osd.20 weight 1.000 > item osd.21 weight 1.000 > item osd.22 weight 1.000 > item osd.23 weight 1.000 > item osd.24 weight 1.000 > item osd.41 weight 1.000 > item osd.42 weight 1.000 > item osd.43 weight 1.000 > item osd.44 weight 1.000 > item osd.45 weight 1.000 > item osd.46 weight 1.000 > item osd.52 weight 1.000 > item osd.53 weight 1.000 > item osd.54 weight 1.000 > item osd.55 weight 1.000 > item osd.56 weight 0.000 > item osd.57 weight 0.000 > } > host cstore3 { > id -12 # do not change unnecessarily > # weight 16.000 > alg straw > hash 0 # rjenkins1 > item osd.25 weight 1.000 > item osd.26 weight 1.000 > item osd.27 weight 1.000 > item osd.28 weight 1.000 > item osd.29 weight 1.000 > item osd.30 weight 1.000 > item osd.31 weight 1.000 > item osd.32 weight 1.000 > item osd.33 weight 1.000 > item osd.34 weight 1.000 > item osd.35 weight 1.000 > item osd.36 weight 1.000 > item osd.37 weight 1.000 > item osd.38 weight 1.000 > item osd.39 weight 1.000 > item osd.40 weight 1.000 > } > host cstore4 { > id -13 # do not change unnecessarily > # weight 7.640 > alg straw > hash 0 # rjenkins1 > item osd.62 weight 0.550 > item osd.63 weight 0.550 > item osd.64 weight 0.550 > item osd.65 weight 0.550 > item osd.66 weight 0.550 > item osd.67 weight 0.550 > item osd.68 weight 0.550 > item osd.69 weight 0.550 > item osd.70 weight 0.270 > item osd.71 weight 0.270 > item osd.72 weight 0.270 > item osd.73 weight 0.270 > item osd.74 weight 0.270 > item osd.75 weight 0.270 > item osd.76 weight 0.270 > item osd.77 weight 0.270 > item osd.78 weight 0.270 > item osd.79 weight 0.270 > item osd.80 weight 0.270 > item osd.81 weight 0.270 > } > host cstore5 { > id -14 # do not change unnecessarily > # weight 3.320 > alg straw > hash 0 # rjenkins1 > item osd.0 weight 0.400 > item osd.1 weight 0.500 > item osd.3 weight 0.400 > item osd.58 weight 0.400 > item osd.59 weight 0.540 > item osd.60 weight 0.540 > item osd.61 weight 0.540 > } > rack unknownrack { > id -3 # do not change unnecessarily > # weight 67.960 > alg straw > hash 0 # rjenkins1 > item gstore5 weight 5.000 > item cstore1 weight 16.000 > item cstore2 weight 20.000 > item cstore3 weight 16.000 > item cstore4 weight 7.640 > item cstore5 weight 3.320 > } > pool default { > id -1 # do not change unnecessarily > # weight 67.960 > alg straw > hash 0 # rjenkins1 > item unknownrack weight 67.960 > } > host gstore1 { > id -6 # do not change unnecessarily > # weight 32.760 > alg straw > hash 0 # rjenkins1 > item osd.82 weight 2.730 > item osd.83 weight 2.730 > item osd.84 weight 2.730 > item osd.85 weight 2.730 > item osd.86 weight 2.730 > item osd.87 weight 2.730 > item osd.88 weight 2.730 > item osd.89 weight 2.730 > item osd.90 weight 2.730 > item osd.91 weight 2.730 > item osd.92 weight 2.730 > item osd.93 weight 2.730 > } > datacenter datacenter-cod { > id -4 # do not change unnecessarily > # weight 32.760 > alg straw > hash 0 # rjenkins1 > item gstore1 weight 32.760 > } > host gstore2 { > id -7 # do not change unnecessarily > # weight 32.760 > alg straw > hash 0 # rjenkins1 > item osd.94 weight 2.730 > item osd.95 weight 2.730 > item osd.96 weight 2.730 > item osd.97 weight 2.730 > item osd.98 weight 2.730 > item osd.99 weight 2.730 > item osd.100 weight 2.730 > item osd.101 weight 2.730 > item osd.102 weight 2.730 > item osd.103 weight 2.730 > item osd.104 weight 2.730 > item osd.105 weight 2.730 > } > datacenter datacenter-rcod { > id -5 # do not change unnecessarily > # weight 32.760 > alg straw > hash 0 # rjenkins1 > item gstore2 weight 32.760 > } > pool iscsi { > id -2 # do not change unnecessarily > # weight 65.520 > alg straw > hash 0 # rjenkins1 > item datacenter-cod weight 32.760 > item datacenter-rcod weight 32.760 > } > > # rules > rule data { > ruleset 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > rule metadata { > ruleset 1 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > rule rbd { > ruleset 2 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > > # end crush map > > > Restarting mon don't help. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com