Re: all oas crush on start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's probably not the same issue as that ticket, which was about the
OSD handling a lack of output incorrectly. (It might be handling the
output incorrectly in some other way, but hopefully not...)

Have you run this crush map through any test mappings yet?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sun, Jul 14, 2013 at 10:59 PM, Vladislav Gorbunov <vadikgo@xxxxxxxxx> wrote:
> Sympthoms like on http://tracker.ceph.com/issues/4699
>
> all OSDs the process ceph-osd crash with segfault
>
> If i stop MONs daemons then i can start OSDs but if i start MONs back
> then die all OSDs again.
>
> more detailed log:
>      0> 2013-07-15 16:42:05.001242 7ffe5a6fc700 -1 *** Caught signal
> (Segmentation fault) **
>  in thread 7ffe5a6fc700
>
>  ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
>  1: /usr/bin/ceph-osd() [0x790e5a]
>  2: (()+0xfcb0) [0x7ffe6b729cb0]
>  3: /usr/bin/ceph-osd() [0x893879]
>  4: (crush_do_rule()+0x1e5) [0x894065]
>  5: (CrushWrapper::do_rule(int, int, std::vector<int,
> std::allocator<int> >&, int, std::vector<unsigned int,
> std::allocator<unsigned int> > const&) const+0x7a) [0x81b2ba]
>  6: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int,
> std::allocator<int> >&) const+0x8f) [0x80d7cf]
>  7: (OSDMap::pg_to_up_acting_osds(pg_t, std::vector<int,
> std::allocator<int> >&, std::vector<int, std::allocator<int> >&)
> const+0xa6) [0x80d8a6]
>  8: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&,
> PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>,
> std::less<boost::intrusive_ptr<PG> >,
> std::allocator<boost::intrusive_ptr<PG> > >*)+0x190) [0x631b70]
>  9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> >
> const&, ThreadPool::TPHandle&)+0x244) [0x632254]
>  10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
> const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2]
>  11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66]
>  12: (ThreadPool::WorkThread::entry()+0x10) [0x837890]
>  13: (()+0x7e9a) [0x7ffe6b721e9a]
>  14: (clone()+0x6d) [0x7ffe69f9fccd]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 1 lockdep
>    0/ 1 context
>    1/ 1 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 1 buffer
>    0/ 1 timer
>    0/ 1 filer
>    0/ 1 striper
>    0/ 1 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 3 filestore
>    1/ 3 journal
>    0/ 5 ms
>    1/ 5 mon
>    0/10 monc
>    0/ 5 paxos
>    0/ 5 tp
>    1/ 5 auth
>    1/ 5 crypto
>    1/ 1 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 hadoop
>    1/ 5 javaclient
>    1/ 5 asok
>    1/ 1 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.2.log
> --- end dump of recent events ---
>
> 2013/7/14 Vladislav Gorbunov <vadikgo@xxxxxxxxx>:
>> Hello!
>>
>> After change the crush map all osd (ceph version 0.61.4
>> (1669132fcfc27d0c0b5e5bb93ade59d147e23404)) on pool default is crushed
>> with the error:
>> 2013-07-14 17:26:23.755432 7f0c963ad700 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7f0c963ad700
>> ...skipping...
>>  10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
>> const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2]
>>  11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66]
>>  12: (ThreadPool::WorkThread::entry()+0x10) [0x837890]
>>  13: (()+0x7e9a) [0x7fac4e597e9a]
>>  14: (clone()+0x6d) [0x7fac4ce15ccd]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> --- logging levels ---
>>    0/ 5 none
>>    0/ 1 lockdep
>>    0/ 1 context
>>    1/ 1 crush
>>    1/ 5 mds
>>    1/ 5 mds_balancer
>>    1/ 5 mds_locker
>>    1/ 5 mds_log
>>    1/ 5 mds_log_expire
>>    1/ 5 mds_migrator
>>    0/ 1 buffer
>>    0/ 1 timer
>>    0/ 1 filer
>>    0/ 1 striper
>>    0/ 1 objecter
>>    0/ 5 rados
>>    0/ 5 rbd
>>    0/ 5 journaler
>>    0/ 5 objectcacher
>>    0/ 5 client
>>    0/ 5 osd
>>    0/ 5 optracker
>>    0/ 5 objclass
>>    1/ 3 filestore
>>    1/ 3 journal
>>    0/ 5 ms
>>    1/ 5 mon
>>    0/10 monc
>>    0/ 5 paxos
>>    0/ 5 tp
>>    1/ 5 auth
>>    1/ 5 crypto
>>    1/ 1 finisher
>>    1/ 5 heartbeatmap
>>    1/ 5 perfcounter
>>    1/ 5 rgw
>>    1/ 5 hadoop
>>    1/ 5 javaclient
>>    1/ 5 asok
>>    1/ 1 throttle
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent     10000
>>   max_new         1000
>>   log_file /var/log/ceph/ceph-osd.2.log
>> --- end dump of recent events ---
>>
>> ceph osd complity ignoring osd start/stop an always show this map:
>> # id weight type name up/down reweight
>> -2 65.52 pool iscsi
>> -4 32.76 datacenter datacenter-cod
>> -6 32.76 host gstore1
>> 82 2.73 osd.82 up 1
>> 83 2.73 osd.83 up 1
>> 84 2.73 osd.84 up 1
>> 85 2.73 osd.85 up 1
>> 86 2.73 osd.86 up 1
>> 87 2.73 osd.87 up 1
>> 88 2.73 osd.88 up 1
>> 89 2.73 osd.89 up 1
>> 90 2.73 osd.90 up 1
>> 91 2.73 osd.91 up 1
>> 92 2.73 osd.92 up 1
>> 93 2.73 osd.93 up 1
>> -5 32.76 datacenter datacenter-rcod
>> -7 32.76 host gstore2
>> 94 2.73 osd.94 up 1
>> 95 2.73 osd.95 up 1
>> 96 2.73 osd.96 up 1
>> 97 2.73 osd.97 up 1
>> 98 2.73 osd.98 up 1
>> 99 2.73 osd.99 up 1
>> 100 2.73 osd.100 up 1
>> 101 2.73 osd.101 up 1
>> 102 2.73 osd.102 up 1
>> 103 2.73 osd.103 up 1
>> 104 2.73 osd.104 up 1
>> 105 2.73 osd.105 up 1
>> -1 68.96 pool default
>> -3 68.96 rack unknownrack
>> -9 5 host gstore5
>> 2 5 osd.2 down 1
>> -10 16 host cstore1
>> 5 1 osd.5 down 0
>> 6 1 osd.6 down 0
>> 7 1 osd.7 down 0
>> 8 1 osd.8 down 0
>> 9 1 osd.9 down 0
>> 10 1 osd.10 down 0
>> 11 1 osd.11 down 0
>> 12 1 osd.12 down 0
>> 13 1 osd.13 down 0
>> 14 1 osd.14 down 0
>> 4 1 osd.4 down 0
>> 47 1 osd.47 down 0
>> 48 1 osd.48 down 0
>> 49 1 osd.49 down 0
>> 50 1 osd.50 down 0
>> 51 1 osd.51 down 0
>> -11 21 host cstore2
>> 15 1 osd.15 down 0
>> 16 1 osd.16 down 0
>> 17 1 osd.17 down 0
>> 18 1 osd.18 down 0
>> 19 1 osd.19 down 0
>> 20 1 osd.20 down 0
>> 21 1 osd.21 down 0
>> 22 1 osd.22 down 0
>> 23 1 osd.23 down 0
>> 24 1 osd.24 down 0
>> 41 1 osd.41 down 0
>> 42 1 osd.42 down 0
>> 43 1 osd.43 down 0
>> 44 1 osd.44 down 0
>> 45 1 osd.45 down 0
>> 46 1 osd.46 down 0
>> 52 1 osd.52 down 0
>> 53 1 osd.53 down 0
>> 54 1 osd.54 down 0
>> 55 1 osd.55 down 0
>> 56 1 osd.56 up 1
>> 57 0 osd.57 up 1
>> -12 16 host cstore3
>> 25 1 osd.25 down 0
>> 26 1 osd.26 down 0
>> 27 1 osd.27 down 0
>> 28 1 osd.28 down 0
>> 29 1 osd.29 down 0
>> 30 1 osd.30 down 0
>> 31 1 osd.31 down 0
>> 32 1 osd.32 down 0
>> 33 1 osd.33 down 0
>> 34 1 osd.34 down 0
>> 35 1 osd.35 down 0
>> 36 1 osd.36 down 0
>> 37 1 osd.37 down 0
>> 38 1 osd.38 down 0
>> 39 1 osd.39 down 0
>> 40 1 osd.40 down 0
>> -13 7.64 host cstore4
>> 62 0.55 osd.62 down 0
>> 63 0.55 osd.63 down 0
>> 64 0.55 osd.64 down 0
>> 65 0.55 osd.65 down 0
>> 66 0.55 osd.66 down 0
>> 67 0.55 osd.67 down 0
>> 68 0.55 osd.68 down 0
>> 69 0.55 osd.69 down 0
>> 70 0.27 osd.70 down 0
>> 71 0.27 osd.71 down 0
>> 72 0.27 osd.72 down 0
>> 73 0.27 osd.73 down 0
>> 74 0.27 osd.74 down 0
>> 75 0.27 osd.75 down 0
>> 76 0.27 osd.76 down 0
>> 77 0.27 osd.77 down 0
>> 78 0.27 osd.78 down 0
>> 79 0.27 osd.79 down 0
>> 80 0.27 osd.80 down 0
>> 81 0.27 osd.81 down 0
>> -14 3.32 host cstore5
>> 0 0.4 osd.0 down 0
>> 1 0.5 osd.1 down 0
>> 3 0.4 osd.3 down 0
>> 58 0.4 osd.58 down 1
>> 59 0.54 osd.59 down 1
>> 60 0.54 osd.60 down 1
>> 61 0.54 osd.61 down 1
>>
>> crush map is:
>> # begin crush map
>>
>> # devices
>> device 0 osd.0
>> device 1 osd.1
>> device 2 osd.2
>> device 3 osd.3
>> device 4 osd.4
>> device 5 osd.5
>> device 6 osd.6
>> device 7 osd.7
>> device 8 osd.8
>> device 9 osd.9
>> device 10 osd.10
>> device 11 osd.11
>> device 12 osd.12
>> device 13 osd.13
>> device 14 osd.14
>> device 15 osd.15
>> device 16 osd.16
>> device 17 osd.17
>> device 18 osd.18
>> device 19 osd.19
>> device 20 osd.20
>> device 21 osd.21
>> device 22 osd.22
>> device 23 osd.23
>> device 24 osd.24
>> device 25 osd.25
>> device 26 osd.26
>> device 27 osd.27
>> device 28 osd.28
>> device 29 osd.29
>> device 30 osd.30
>> device 31 osd.31
>> device 32 osd.32
>> device 33 osd.33
>> device 34 osd.34
>> device 35 osd.35
>> device 36 osd.36
>> device 37 osd.37
>> device 38 osd.38
>> device 39 osd.39
>> device 40 osd.40
>> device 41 osd.41
>> device 42 osd.42
>> device 43 osd.43
>> device 44 osd.44
>> device 45 osd.45
>> device 46 osd.46
>> device 47 osd.47
>> device 48 osd.48
>> device 49 osd.49
>> device 50 osd.50
>> device 51 osd.51
>> device 52 osd.52
>> device 53 osd.53
>> device 54 osd.54
>> device 55 osd.55
>> device 56 osd.56
>> device 57 osd.57
>> device 58 osd.58
>> device 59 osd.59
>> device 60 osd.60
>> device 61 osd.61
>> device 62 osd.62
>> device 63 osd.63
>> device 64 osd.64
>> device 65 osd.65
>> device 66 osd.66
>> device 67 osd.67
>> device 68 osd.68
>> device 69 osd.69
>> device 70 osd.70
>> device 71 osd.71
>> device 72 osd.72
>> device 73 osd.73
>> device 74 osd.74
>> device 75 osd.75
>> device 76 osd.76
>> device 77 osd.77
>> device 78 osd.78
>> device 79 osd.79
>> device 80 osd.80
>> device 81 osd.81
>> device 82 osd.82
>> device 83 osd.83
>> device 84 osd.84
>> device 85 osd.85
>> device 86 osd.86
>> device 87 osd.87
>> device 88 osd.88
>> device 89 osd.89
>> device 90 osd.90
>> device 91 osd.91
>> device 92 osd.92
>> device 93 osd.93
>> device 94 osd.94
>> device 95 osd.95
>> device 96 osd.96
>> device 97 osd.97
>> device 98 osd.98
>> device 99 osd.99
>> device 100 osd.100
>> device 101 osd.101
>> device 102 osd.102
>> device 103 osd.103
>> device 104 osd.104
>> device 105 osd.105
>>
>> # types
>> type 0 osd
>> type 1 host
>> type 2 rack
>> type 3 row
>> type 4 room
>> type 5 datacenter
>> type 6 pool
>>
>> # buckets
>> host gstore5 {
>> id -9 # do not change unnecessarily
>> # weight 5.000
>> alg straw
>> hash 0 # rjenkins1
>> item osd.2 weight 5.000
>> }
>> host cstore1 {
>> id -10 # do not change unnecessarily
>> # weight 16.000
>> alg straw
>> hash 0 # rjenkins1
>> item osd.5 weight 1.000
>> item osd.6 weight 1.000
>> item osd.7 weight 1.000
>> item osd.8 weight 1.000
>> item osd.9 weight 1.000
>> item osd.10 weight 1.000
>> item osd.11 weight 1.000
>> item osd.12 weight 1.000
>> item osd.13 weight 1.000
>> item osd.14 weight 1.000
>> item osd.4 weight 1.000
>> item osd.47 weight 1.000
>> item osd.48 weight 1.000
>> item osd.49 weight 1.000
>> item osd.50 weight 1.000
>> item osd.51 weight 1.000
>> }
>> host cstore2 {
>> id -11 # do not change unnecessarily
>> # weight 20.000
>> alg straw
>> hash 0 # rjenkins1
>> item osd.15 weight 1.000
>> item osd.16 weight 1.000
>> item osd.17 weight 1.000
>> item osd.18 weight 1.000
>> item osd.19 weight 1.000
>> item osd.20 weight 1.000
>> item osd.21 weight 1.000
>> item osd.22 weight 1.000
>> item osd.23 weight 1.000
>> item osd.24 weight 1.000
>> item osd.41 weight 1.000
>> item osd.42 weight 1.000
>> item osd.43 weight 1.000
>> item osd.44 weight 1.000
>> item osd.45 weight 1.000
>> item osd.46 weight 1.000
>> item osd.52 weight 1.000
>> item osd.53 weight 1.000
>> item osd.54 weight 1.000
>> item osd.55 weight 1.000
>> item osd.56 weight 0.000
>> item osd.57 weight 0.000
>> }
>> host cstore3 {
>> id -12 # do not change unnecessarily
>> # weight 16.000
>> alg straw
>> hash 0 # rjenkins1
>> item osd.25 weight 1.000
>> item osd.26 weight 1.000
>> item osd.27 weight 1.000
>> item osd.28 weight 1.000
>> item osd.29 weight 1.000
>> item osd.30 weight 1.000
>> item osd.31 weight 1.000
>> item osd.32 weight 1.000
>> item osd.33 weight 1.000
>> item osd.34 weight 1.000
>> item osd.35 weight 1.000
>> item osd.36 weight 1.000
>> item osd.37 weight 1.000
>> item osd.38 weight 1.000
>> item osd.39 weight 1.000
>> item osd.40 weight 1.000
>> }
>> host cstore4 {
>> id -13 # do not change unnecessarily
>> # weight 7.640
>> alg straw
>> hash 0 # rjenkins1
>> item osd.62 weight 0.550
>> item osd.63 weight 0.550
>> item osd.64 weight 0.550
>> item osd.65 weight 0.550
>> item osd.66 weight 0.550
>> item osd.67 weight 0.550
>> item osd.68 weight 0.550
>> item osd.69 weight 0.550
>> item osd.70 weight 0.270
>> item osd.71 weight 0.270
>> item osd.72 weight 0.270
>> item osd.73 weight 0.270
>> item osd.74 weight 0.270
>> item osd.75 weight 0.270
>> item osd.76 weight 0.270
>> item osd.77 weight 0.270
>> item osd.78 weight 0.270
>> item osd.79 weight 0.270
>> item osd.80 weight 0.270
>> item osd.81 weight 0.270
>> }
>> host cstore5 {
>> id -14 # do not change unnecessarily
>> # weight 3.320
>> alg straw
>> hash 0 # rjenkins1
>> item osd.0 weight 0.400
>> item osd.1 weight 0.500
>> item osd.3 weight 0.400
>> item osd.58 weight 0.400
>> item osd.59 weight 0.540
>> item osd.60 weight 0.540
>> item osd.61 weight 0.540
>> }
>> rack unknownrack {
>> id -3 # do not change unnecessarily
>> # weight 67.960
>> alg straw
>> hash 0 # rjenkins1
>> item gstore5 weight 5.000
>> item cstore1 weight 16.000
>> item cstore2 weight 20.000
>> item cstore3 weight 16.000
>> item cstore4 weight 7.640
>> item cstore5 weight 3.320
>> }
>> pool default {
>> id -1 # do not change unnecessarily
>> # weight 67.960
>> alg straw
>> hash 0 # rjenkins1
>> item unknownrack weight 67.960
>> }
>> host gstore1 {
>> id -6 # do not change unnecessarily
>> # weight 32.760
>> alg straw
>> hash 0 # rjenkins1
>> item osd.82 weight 2.730
>> item osd.83 weight 2.730
>> item osd.84 weight 2.730
>> item osd.85 weight 2.730
>> item osd.86 weight 2.730
>> item osd.87 weight 2.730
>> item osd.88 weight 2.730
>> item osd.89 weight 2.730
>> item osd.90 weight 2.730
>> item osd.91 weight 2.730
>> item osd.92 weight 2.730
>> item osd.93 weight 2.730
>> }
>> datacenter datacenter-cod {
>> id -4 # do not change unnecessarily
>> # weight 32.760
>> alg straw
>> hash 0 # rjenkins1
>> item gstore1 weight 32.760
>> }
>> host gstore2 {
>> id -7 # do not change unnecessarily
>> # weight 32.760
>> alg straw
>> hash 0 # rjenkins1
>> item osd.94 weight 2.730
>> item osd.95 weight 2.730
>> item osd.96 weight 2.730
>> item osd.97 weight 2.730
>> item osd.98 weight 2.730
>> item osd.99 weight 2.730
>> item osd.100 weight 2.730
>> item osd.101 weight 2.730
>> item osd.102 weight 2.730
>> item osd.103 weight 2.730
>> item osd.104 weight 2.730
>> item osd.105 weight 2.730
>> }
>> datacenter datacenter-rcod {
>> id -5 # do not change unnecessarily
>> # weight 32.760
>> alg straw
>> hash 0 # rjenkins1
>> item gstore2 weight 32.760
>> }
>> pool iscsi {
>> id -2 # do not change unnecessarily
>> # weight 65.520
>> alg straw
>> hash 0 # rjenkins1
>> item datacenter-cod weight 32.760
>> item datacenter-rcod weight 32.760
>> }
>>
>> # rules
>> rule data {
>> ruleset 0
>> type replicated
>> min_size 1
>> max_size 10
>> step take default
>> step chooseleaf firstn 0 type host
>> step emit
>> }
>> rule metadata {
>> ruleset 1
>> type replicated
>> min_size 1
>> max_size 10
>> step take default
>> step chooseleaf firstn 0 type host
>> step emit
>> }
>> rule rbd {
>> ruleset 2
>> type replicated
>> min_size 1
>> max_size 10
>> step take default
>> step chooseleaf firstn 0 type host
>> step emit
>> }
>>
>> # end crush map
>>
>>
>> Restarting mon don't help.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux