Re: all oas crush on start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sympthoms like on http://tracker.ceph.com/issues/4699

all OSDs the process ceph-osd crash with segfault

If i stop MONs daemons then i can start OSDs but if i start MONs back
then die all OSDs again.

more detailed log:
     0> 2013-07-15 16:42:05.001242 7ffe5a6fc700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7ffe5a6fc700

 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
 1: /usr/bin/ceph-osd() [0x790e5a]
 2: (()+0xfcb0) [0x7ffe6b729cb0]
 3: /usr/bin/ceph-osd() [0x893879]
 4: (crush_do_rule()+0x1e5) [0x894065]
 5: (CrushWrapper::do_rule(int, int, std::vector<int,
std::allocator<int> >&, int, std::vector<unsigned int,
std::allocator<unsigned int> > const&) const+0x7a) [0x81b2ba]
 6: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int,
std::allocator<int> >&) const+0x8f) [0x80d7cf]
 7: (OSDMap::pg_to_up_acting_osds(pg_t, std::vector<int,
std::allocator<int> >&, std::vector<int, std::allocator<int> >&)
const+0xa6) [0x80d8a6]
 8: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&,
PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>,
std::less<boost::intrusive_ptr<PG> >,
std::allocator<boost::intrusive_ptr<PG> > >*)+0x190) [0x631b70]
 9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> >
const&, ThreadPool::TPHandle&)+0x244) [0x632254]
 10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2]
 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66]
 12: (ThreadPool::WorkThread::entry()+0x10) [0x837890]
 13: (()+0x7e9a) [0x7ffe6b721e9a]
 14: (clone()+0x6d) [0x7ffe69f9fccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-osd.2.log
--- end dump of recent events ---

2013/7/14 Vladislav Gorbunov <vadikgo@xxxxxxxxx>:
> Hello!
>
> After change the crush map all osd (ceph version 0.61.4
> (1669132fcfc27d0c0b5e5bb93ade59d147e23404)) on pool default is crushed
> with the error:
> 2013-07-14 17:26:23.755432 7f0c963ad700 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7f0c963ad700
> ...skipping...
>  10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
> const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2]
>  11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66]
>  12: (ThreadPool::WorkThread::entry()+0x10) [0x837890]
>  13: (()+0x7e9a) [0x7fac4e597e9a]
>  14: (clone()+0x6d) [0x7fac4ce15ccd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    0/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 hadoop
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.2.log
> --- end dump of recent events ---
>
> ceph osd complity ignoring osd start/stop an always show this map:
> # id weight type name up/down reweight
> -2 65.52 pool iscsi
> -4 32.76 datacenter datacenter-cod
> -6 32.76 host gstore1
> 82 2.73 osd.82 up 1
> 83 2.73 osd.83 up 1
> 84 2.73 osd.84 up 1
> 85 2.73 osd.85 up 1
> 86 2.73 osd.86 up 1
> 87 2.73 osd.87 up 1
> 88 2.73 osd.88 up 1
> 89 2.73 osd.89 up 1
> 90 2.73 osd.90 up 1
> 91 2.73 osd.91 up 1
> 92 2.73 osd.92 up 1
> 93 2.73 osd.93 up 1
> -5 32.76 datacenter datacenter-rcod
> -7 32.76 host gstore2
> 94 2.73 osd.94 up 1
> 95 2.73 osd.95 up 1
> 96 2.73 osd.96 up 1
> 97 2.73 osd.97 up 1
> 98 2.73 osd.98 up 1
> 99 2.73 osd.99 up 1
> 100 2.73 osd.100 up 1
> 101 2.73 osd.101 up 1
> 102 2.73 osd.102 up 1
> 103 2.73 osd.103 up 1
> 104 2.73 osd.104 up 1
> 105 2.73 osd.105 up 1
> -1 68.96 pool default
> -3 68.96 rack unknownrack
> -9 5 host gstore5
> 2 5 osd.2 down 1
> -10 16 host cstore1
> 5 1 osd.5 down 0
> 6 1 osd.6 down 0
> 7 1 osd.7 down 0
> 8 1 osd.8 down 0
> 9 1 osd.9 down 0
> 10 1 osd.10 down 0
> 11 1 osd.11 down 0
> 12 1 osd.12 down 0
> 13 1 osd.13 down 0
> 14 1 osd.14 down 0
> 4 1 osd.4 down 0
> 47 1 osd.47 down 0
> 48 1 osd.48 down 0
> 49 1 osd.49 down 0
> 50 1 osd.50 down 0
> 51 1 osd.51 down 0
> -11 21 host cstore2
> 15 1 osd.15 down 0
> 16 1 osd.16 down 0
> 17 1 osd.17 down 0
> 18 1 osd.18 down 0
> 19 1 osd.19 down 0
> 20 1 osd.20 down 0
> 21 1 osd.21 down 0
> 22 1 osd.22 down 0
> 23 1 osd.23 down 0
> 24 1 osd.24 down 0
> 41 1 osd.41 down 0
> 42 1 osd.42 down 0
> 43 1 osd.43 down 0
> 44 1 osd.44 down 0
> 45 1 osd.45 down 0
> 46 1 osd.46 down 0
> 52 1 osd.52 down 0
> 53 1 osd.53 down 0
> 54 1 osd.54 down 0
> 55 1 osd.55 down 0
> 56 1 osd.56 up 1
> 57 0 osd.57 up 1
> -12 16 host cstore3
> 25 1 osd.25 down 0
> 26 1 osd.26 down 0
> 27 1 osd.27 down 0
> 28 1 osd.28 down 0
> 29 1 osd.29 down 0
> 30 1 osd.30 down 0
> 31 1 osd.31 down 0
> 32 1 osd.32 down 0
> 33 1 osd.33 down 0
> 34 1 osd.34 down 0
> 35 1 osd.35 down 0
> 36 1 osd.36 down 0
> 37 1 osd.37 down 0
> 38 1 osd.38 down 0
> 39 1 osd.39 down 0
> 40 1 osd.40 down 0
> -13 7.64 host cstore4
> 62 0.55 osd.62 down 0
> 63 0.55 osd.63 down 0
> 64 0.55 osd.64 down 0
> 65 0.55 osd.65 down 0
> 66 0.55 osd.66 down 0
> 67 0.55 osd.67 down 0
> 68 0.55 osd.68 down 0
> 69 0.55 osd.69 down 0
> 70 0.27 osd.70 down 0
> 71 0.27 osd.71 down 0
> 72 0.27 osd.72 down 0
> 73 0.27 osd.73 down 0
> 74 0.27 osd.74 down 0
> 75 0.27 osd.75 down 0
> 76 0.27 osd.76 down 0
> 77 0.27 osd.77 down 0
> 78 0.27 osd.78 down 0
> 79 0.27 osd.79 down 0
> 80 0.27 osd.80 down 0
> 81 0.27 osd.81 down 0
> -14 3.32 host cstore5
> 0 0.4 osd.0 down 0
> 1 0.5 osd.1 down 0
> 3 0.4 osd.3 down 0
> 58 0.4 osd.58 down 1
> 59 0.54 osd.59 down 1
> 60 0.54 osd.60 down 1
> 61 0.54 osd.61 down 1
>
> crush map is:
> # begin crush map
>
> # devices
> device 0 osd.0
> device 1 osd.1
> device 2 osd.2
> device 3 osd.3
> device 4 osd.4
> device 5 osd.5
> device 6 osd.6
> device 7 osd.7
> device 8 osd.8
> device 9 osd.9
> device 10 osd.10
> device 11 osd.11
> device 12 osd.12
> device 13 osd.13
> device 14 osd.14
> device 15 osd.15
> device 16 osd.16
> device 17 osd.17
> device 18 osd.18
> device 19 osd.19
> device 20 osd.20
> device 21 osd.21
> device 22 osd.22
> device 23 osd.23
> device 24 osd.24
> device 25 osd.25
> device 26 osd.26
> device 27 osd.27
> device 28 osd.28
> device 29 osd.29
> device 30 osd.30
> device 31 osd.31
> device 32 osd.32
> device 33 osd.33
> device 34 osd.34
> device 35 osd.35
> device 36 osd.36
> device 37 osd.37
> device 38 osd.38
> device 39 osd.39
> device 40 osd.40
> device 41 osd.41
> device 42 osd.42
> device 43 osd.43
> device 44 osd.44
> device 45 osd.45
> device 46 osd.46
> device 47 osd.47
> device 48 osd.48
> device 49 osd.49
> device 50 osd.50
> device 51 osd.51
> device 52 osd.52
> device 53 osd.53
> device 54 osd.54
> device 55 osd.55
> device 56 osd.56
> device 57 osd.57
> device 58 osd.58
> device 59 osd.59
> device 60 osd.60
> device 61 osd.61
> device 62 osd.62
> device 63 osd.63
> device 64 osd.64
> device 65 osd.65
> device 66 osd.66
> device 67 osd.67
> device 68 osd.68
> device 69 osd.69
> device 70 osd.70
> device 71 osd.71
> device 72 osd.72
> device 73 osd.73
> device 74 osd.74
> device 75 osd.75
> device 76 osd.76
> device 77 osd.77
> device 78 osd.78
> device 79 osd.79
> device 80 osd.80
> device 81 osd.81
> device 82 osd.82
> device 83 osd.83
> device 84 osd.84
> device 85 osd.85
> device 86 osd.86
> device 87 osd.87
> device 88 osd.88
> device 89 osd.89
> device 90 osd.90
> device 91 osd.91
> device 92 osd.92
> device 93 osd.93
> device 94 osd.94
> device 95 osd.95
> device 96 osd.96
> device 97 osd.97
> device 98 osd.98
> device 99 osd.99
> device 100 osd.100
> device 101 osd.101
> device 102 osd.102
> device 103 osd.103
> device 104 osd.104
> device 105 osd.105
>
> # types
> type 0 osd
> type 1 host
> type 2 rack
> type 3 row
> type 4 room
> type 5 datacenter
> type 6 pool
>
> # buckets
> host gstore5 {
> id -9 # do not change unnecessarily
> # weight 5.000
> alg straw
> hash 0 # rjenkins1
> item osd.2 weight 5.000
> }
> host cstore1 {
> id -10 # do not change unnecessarily
> # weight 16.000
> alg straw
> hash 0 # rjenkins1
> item osd.5 weight 1.000
> item osd.6 weight 1.000
> item osd.7 weight 1.000
> item osd.8 weight 1.000
> item osd.9 weight 1.000
> item osd.10 weight 1.000
> item osd.11 weight 1.000
> item osd.12 weight 1.000
> item osd.13 weight 1.000
> item osd.14 weight 1.000
> item osd.4 weight 1.000
> item osd.47 weight 1.000
> item osd.48 weight 1.000
> item osd.49 weight 1.000
> item osd.50 weight 1.000
> item osd.51 weight 1.000
> }
> host cstore2 {
> id -11 # do not change unnecessarily
> # weight 20.000
> alg straw
> hash 0 # rjenkins1
> item osd.15 weight 1.000
> item osd.16 weight 1.000
> item osd.17 weight 1.000
> item osd.18 weight 1.000
> item osd.19 weight 1.000
> item osd.20 weight 1.000
> item osd.21 weight 1.000
> item osd.22 weight 1.000
> item osd.23 weight 1.000
> item osd.24 weight 1.000
> item osd.41 weight 1.000
> item osd.42 weight 1.000
> item osd.43 weight 1.000
> item osd.44 weight 1.000
> item osd.45 weight 1.000
> item osd.46 weight 1.000
> item osd.52 weight 1.000
> item osd.53 weight 1.000
> item osd.54 weight 1.000
> item osd.55 weight 1.000
> item osd.56 weight 0.000
> item osd.57 weight 0.000
> }
> host cstore3 {
> id -12 # do not change unnecessarily
> # weight 16.000
> alg straw
> hash 0 # rjenkins1
> item osd.25 weight 1.000
> item osd.26 weight 1.000
> item osd.27 weight 1.000
> item osd.28 weight 1.000
> item osd.29 weight 1.000
> item osd.30 weight 1.000
> item osd.31 weight 1.000
> item osd.32 weight 1.000
> item osd.33 weight 1.000
> item osd.34 weight 1.000
> item osd.35 weight 1.000
> item osd.36 weight 1.000
> item osd.37 weight 1.000
> item osd.38 weight 1.000
> item osd.39 weight 1.000
> item osd.40 weight 1.000
> }
> host cstore4 {
> id -13 # do not change unnecessarily
> # weight 7.640
> alg straw
> hash 0 # rjenkins1
> item osd.62 weight 0.550
> item osd.63 weight 0.550
> item osd.64 weight 0.550
> item osd.65 weight 0.550
> item osd.66 weight 0.550
> item osd.67 weight 0.550
> item osd.68 weight 0.550
> item osd.69 weight 0.550
> item osd.70 weight 0.270
> item osd.71 weight 0.270
> item osd.72 weight 0.270
> item osd.73 weight 0.270
> item osd.74 weight 0.270
> item osd.75 weight 0.270
> item osd.76 weight 0.270
> item osd.77 weight 0.270
> item osd.78 weight 0.270
> item osd.79 weight 0.270
> item osd.80 weight 0.270
> item osd.81 weight 0.270
> }
> host cstore5 {
> id -14 # do not change unnecessarily
> # weight 3.320
> alg straw
> hash 0 # rjenkins1
> item osd.0 weight 0.400
> item osd.1 weight 0.500
> item osd.3 weight 0.400
> item osd.58 weight 0.400
> item osd.59 weight 0.540
> item osd.60 weight 0.540
> item osd.61 weight 0.540
> }
> rack unknownrack {
> id -3 # do not change unnecessarily
> # weight 67.960
> alg straw
> hash 0 # rjenkins1
> item gstore5 weight 5.000
> item cstore1 weight 16.000
> item cstore2 weight 20.000
> item cstore3 weight 16.000
> item cstore4 weight 7.640
> item cstore5 weight 3.320
> }
> pool default {
> id -1 # do not change unnecessarily
> # weight 67.960
> alg straw
> hash 0 # rjenkins1
> item unknownrack weight 67.960
> }
> host gstore1 {
> id -6 # do not change unnecessarily
> # weight 32.760
> alg straw
> hash 0 # rjenkins1
> item osd.82 weight 2.730
> item osd.83 weight 2.730
> item osd.84 weight 2.730
> item osd.85 weight 2.730
> item osd.86 weight 2.730
> item osd.87 weight 2.730
> item osd.88 weight 2.730
> item osd.89 weight 2.730
> item osd.90 weight 2.730
> item osd.91 weight 2.730
> item osd.92 weight 2.730
> item osd.93 weight 2.730
> }
> datacenter datacenter-cod {
> id -4 # do not change unnecessarily
> # weight 32.760
> alg straw
> hash 0 # rjenkins1
> item gstore1 weight 32.760
> }
> host gstore2 {
> id -7 # do not change unnecessarily
> # weight 32.760
> alg straw
> hash 0 # rjenkins1
> item osd.94 weight 2.730
> item osd.95 weight 2.730
> item osd.96 weight 2.730
> item osd.97 weight 2.730
> item osd.98 weight 2.730
> item osd.99 weight 2.730
> item osd.100 weight 2.730
> item osd.101 weight 2.730
> item osd.102 weight 2.730
> item osd.103 weight 2.730
> item osd.104 weight 2.730
> item osd.105 weight 2.730
> }
> datacenter datacenter-rcod {
> id -5 # do not change unnecessarily
> # weight 32.760
> alg straw
> hash 0 # rjenkins1
> item gstore2 weight 32.760
> }
> pool iscsi {
> id -2 # do not change unnecessarily
> # weight 65.520
> alg straw
> hash 0 # rjenkins1
> item datacenter-cod weight 32.760
> item datacenter-rcod weight 32.760
> }
>
> # rules
> rule data {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
> rule metadata {
> ruleset 1
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
> rule rbd {
> ruleset 2
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type host
> step emit
> }
>
> # end crush map
>
>
> Restarting mon don't help.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux