Re: all oas crush on start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>Have you run this crush map through any test mappings yet?
Yes, it worked on test cluster, and after apply map to main cluster.
OSD servers downed after i'm try to apply crush ruleset 3 (iscsi) to
pool iscsi:
ceph osd pool set data crush_ruleset 3

2013/7/16 Gregory Farnum <greg@xxxxxxxxxxx>:
> It's probably not the same issue as that ticket, which was about the
> OSD handling a lack of output incorrectly. (It might be handling the
> output incorrectly in some other way, but hopefully not...)
>
> Have you run this crush map through any test mappings yet?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Sun, Jul 14, 2013 at 10:59 PM, Vladislav Gorbunov <vadikgo@xxxxxxxxx> wrote:
>> Sympthoms like on http://tracker.ceph.com/issues/4699
>>
>> all OSDs the process ceph-osd crash with segfault
>>
>> If i stop MONs daemons then i can start OSDs but if i start MONs back
>> then die all OSDs again.
>>
>> more detailed log:
>>      0> 2013-07-15 16:42:05.001242 7ffe5a6fc700 -1 *** Caught signal
>> (Segmentation fault) **
>>  in thread 7ffe5a6fc700
>>
>>  ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
>>  1: /usr/bin/ceph-osd() [0x790e5a]
>>  2: (()+0xfcb0) [0x7ffe6b729cb0]
>>  3: /usr/bin/ceph-osd() [0x893879]
>>  4: (crush_do_rule()+0x1e5) [0x894065]
>>  5: (CrushWrapper::do_rule(int, int, std::vector<int,
>> std::allocator<int> >&, int, std::vector<unsigned int,
>> std::allocator<unsigned int> > const&) const+0x7a) [0x81b2ba]
>>  6: (OSDMap::_pg_to_osds(pg_pool_t const&, pg_t, std::vector<int,
>> std::allocator<int> >&) const+0x8f) [0x80d7cf]
>>  7: (OSDMap::pg_to_up_acting_osds(pg_t, std::vector<int,
>> std::allocator<int> >&, std::vector<int, std::allocator<int> >&)
>> const+0xa6) [0x80d8a6]
>>  8: (OSD::advance_pg(unsigned int, PG*, ThreadPool::TPHandle&,
>> PG::RecoveryCtx*, std::set<boost::intrusive_ptr<PG>,
>> std::less<boost::intrusive_ptr<PG> >,
>> std::allocator<boost::intrusive_ptr<PG> > >*)+0x190) [0x631b70]
>>  9: (OSD::process_peering_events(std::list<PG*, std::allocator<PG*> >
>> const&, ThreadPool::TPHandle&)+0x244) [0x632254]
>>  10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
>> const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2]
>>  11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66]
>>  12: (ThreadPool::WorkThread::entry()+0x10) [0x837890]
>>  13: (()+0x7e9a) [0x7ffe6b721e9a]
>>  14: (clone()+0x6d) [0x7ffe69f9fccd]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> --- logging levels ---
>>    0/ 5 none
>>    0/ 1 lockdep
>>    0/ 1 context
>>    1/ 1 crush
>>    1/ 5 mds
>>    1/ 5 mds_balancer
>>    1/ 5 mds_locker
>>    1/ 5 mds_log
>>    1/ 5 mds_log_expire
>>    1/ 5 mds_migrator
>>    0/ 1 buffer
>>    0/ 1 timer
>>    0/ 1 filer
>>    0/ 1 striper
>>    0/ 1 objecter
>>    0/ 5 rados
>>    0/ 5 rbd
>>    0/ 5 journaler
>>    0/ 5 objectcacher
>>    0/ 5 client
>>    0/ 5 osd
>>    0/ 5 optracker
>>    0/ 5 objclass
>>    1/ 3 filestore
>>    1/ 3 journal
>>    0/ 5 ms
>>    1/ 5 mon
>>    0/10 monc
>>    0/ 5 paxos
>>    0/ 5 tp
>>    1/ 5 auth
>>    1/ 5 crypto
>>    1/ 1 finisher
>>    1/ 5 heartbeatmap
>>    1/ 5 perfcounter
>>    1/ 5 rgw
>>    1/ 5 hadoop
>>    1/ 5 javaclient
>>    1/ 5 asok
>>    1/ 1 throttle
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent     10000
>>   max_new         1000
>>   log_file /var/log/ceph/ceph-osd.2.log
>> --- end dump of recent events ---
>>
>> 2013/7/14 Vladislav Gorbunov <vadikgo@xxxxxxxxx>:
>>> Hello!
>>>
>>> After change the crush map all osd (ceph version 0.61.4
>>> (1669132fcfc27d0c0b5e5bb93ade59d147e23404)) on pool default is crushed
>>> with the error:
>>> 2013-07-14 17:26:23.755432 7f0c963ad700 -1 *** Caught signal
>>> (Segmentation fault) **
>>>  in thread 7f0c963ad700
>>> ...skipping...
>>>  10: (OSD::PeeringWQ::_process(std::list<PG*, std::allocator<PG*> >
>>> const&, ThreadPool::TPHandle&)+0x12) [0x66d0f2]
>>>  11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x835a66]
>>>  12: (ThreadPool::WorkThread::entry()+0x10) [0x837890]
>>>  13: (()+0x7e9a) [0x7fac4e597e9a]
>>>  14: (clone()+0x6d) [0x7fac4ce15ccd]
>>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to interpret this.
>>>
>>> --- logging levels ---
>>>    0/ 5 none
>>>    0/ 1 lockdep
>>>    0/ 1 context
>>>    1/ 1 crush
>>>    1/ 5 mds
>>>    1/ 5 mds_balancer
>>>    1/ 5 mds_locker
>>>    1/ 5 mds_log
>>>    1/ 5 mds_log_expire
>>>    1/ 5 mds_migrator
>>>    0/ 1 buffer
>>>    0/ 1 timer
>>>    0/ 1 filer
>>>    0/ 1 striper
>>>    0/ 1 objecter
>>>    0/ 5 rados
>>>    0/ 5 rbd
>>>    0/ 5 journaler
>>>    0/ 5 objectcacher
>>>    0/ 5 client
>>>    0/ 5 osd
>>>    0/ 5 optracker
>>>    0/ 5 objclass
>>>    1/ 3 filestore
>>>    1/ 3 journal
>>>    0/ 5 ms
>>>    1/ 5 mon
>>>    0/10 monc
>>>    0/ 5 paxos
>>>    0/ 5 tp
>>>    1/ 5 auth
>>>    1/ 5 crypto
>>>    1/ 1 finisher
>>>    1/ 5 heartbeatmap
>>>    1/ 5 perfcounter
>>>    1/ 5 rgw
>>>    1/ 5 hadoop
>>>    1/ 5 javaclient
>>>    1/ 5 asok
>>>    1/ 1 throttle
>>>   -2/-2 (syslog threshold)
>>>   -1/-1 (stderr threshold)
>>>   max_recent     10000
>>>   max_new         1000
>>>   log_file /var/log/ceph/ceph-osd.2.log
>>> --- end dump of recent events ---
>>>
>>> ceph osd complity ignoring osd start/stop an always show this map:
>>> # id weight type name up/down reweight
>>> -2 65.52 pool iscsi
>>> -4 32.76 datacenter datacenter-cod
>>> -6 32.76 host gstore1
>>> 82 2.73 osd.82 up 1
>>> 83 2.73 osd.83 up 1
>>> 84 2.73 osd.84 up 1
>>> 85 2.73 osd.85 up 1
>>> 86 2.73 osd.86 up 1
>>> 87 2.73 osd.87 up 1
>>> 88 2.73 osd.88 up 1
>>> 89 2.73 osd.89 up 1
>>> 90 2.73 osd.90 up 1
>>> 91 2.73 osd.91 up 1
>>> 92 2.73 osd.92 up 1
>>> 93 2.73 osd.93 up 1
>>> -5 32.76 datacenter datacenter-rcod
>>> -7 32.76 host gstore2
>>> 94 2.73 osd.94 up 1
>>> 95 2.73 osd.95 up 1
>>> 96 2.73 osd.96 up 1
>>> 97 2.73 osd.97 up 1
>>> 98 2.73 osd.98 up 1
>>> 99 2.73 osd.99 up 1
>>> 100 2.73 osd.100 up 1
>>> 101 2.73 osd.101 up 1
>>> 102 2.73 osd.102 up 1
>>> 103 2.73 osd.103 up 1
>>> 104 2.73 osd.104 up 1
>>> 105 2.73 osd.105 up 1
>>> -1 68.96 pool default
>>> -3 68.96 rack unknownrack
>>> -9 5 host gstore5
>>> 2 5 osd.2 down 1
>>> -10 16 host cstore1
>>> 5 1 osd.5 down 0
>>> 6 1 osd.6 down 0
>>> 7 1 osd.7 down 0
>>> 8 1 osd.8 down 0
>>> 9 1 osd.9 down 0
>>> 10 1 osd.10 down 0
>>> 11 1 osd.11 down 0
>>> 12 1 osd.12 down 0
>>> 13 1 osd.13 down 0
>>> 14 1 osd.14 down 0
>>> 4 1 osd.4 down 0
>>> 47 1 osd.47 down 0
>>> 48 1 osd.48 down 0
>>> 49 1 osd.49 down 0
>>> 50 1 osd.50 down 0
>>> 51 1 osd.51 down 0
>>> -11 21 host cstore2
>>> 15 1 osd.15 down 0
>>> 16 1 osd.16 down 0
>>> 17 1 osd.17 down 0
>>> 18 1 osd.18 down 0
>>> 19 1 osd.19 down 0
>>> 20 1 osd.20 down 0
>>> 21 1 osd.21 down 0
>>> 22 1 osd.22 down 0
>>> 23 1 osd.23 down 0
>>> 24 1 osd.24 down 0
>>> 41 1 osd.41 down 0
>>> 42 1 osd.42 down 0
>>> 43 1 osd.43 down 0
>>> 44 1 osd.44 down 0
>>> 45 1 osd.45 down 0
>>> 46 1 osd.46 down 0
>>> 52 1 osd.52 down 0
>>> 53 1 osd.53 down 0
>>> 54 1 osd.54 down 0
>>> 55 1 osd.55 down 0
>>> 56 1 osd.56 up 1
>>> 57 0 osd.57 up 1
>>> -12 16 host cstore3
>>> 25 1 osd.25 down 0
>>> 26 1 osd.26 down 0
>>> 27 1 osd.27 down 0
>>> 28 1 osd.28 down 0
>>> 29 1 osd.29 down 0
>>> 30 1 osd.30 down 0
>>> 31 1 osd.31 down 0
>>> 32 1 osd.32 down 0
>>> 33 1 osd.33 down 0
>>> 34 1 osd.34 down 0
>>> 35 1 osd.35 down 0
>>> 36 1 osd.36 down 0
>>> 37 1 osd.37 down 0
>>> 38 1 osd.38 down 0
>>> 39 1 osd.39 down 0
>>> 40 1 osd.40 down 0
>>> -13 7.64 host cstore4
>>> 62 0.55 osd.62 down 0
>>> 63 0.55 osd.63 down 0
>>> 64 0.55 osd.64 down 0
>>> 65 0.55 osd.65 down 0
>>> 66 0.55 osd.66 down 0
>>> 67 0.55 osd.67 down 0
>>> 68 0.55 osd.68 down 0
>>> 69 0.55 osd.69 down 0
>>> 70 0.27 osd.70 down 0
>>> 71 0.27 osd.71 down 0
>>> 72 0.27 osd.72 down 0
>>> 73 0.27 osd.73 down 0
>>> 74 0.27 osd.74 down 0
>>> 75 0.27 osd.75 down 0
>>> 76 0.27 osd.76 down 0
>>> 77 0.27 osd.77 down 0
>>> 78 0.27 osd.78 down 0
>>> 79 0.27 osd.79 down 0
>>> 80 0.27 osd.80 down 0
>>> 81 0.27 osd.81 down 0
>>> -14 3.32 host cstore5
>>> 0 0.4 osd.0 down 0
>>> 1 0.5 osd.1 down 0
>>> 3 0.4 osd.3 down 0
>>> 58 0.4 osd.58 down 1
>>> 59 0.54 osd.59 down 1
>>> 60 0.54 osd.60 down 1
>>> 61 0.54 osd.61 down 1
>>>
>>> crush map is:
>>> # begin crush map
>>>
>>> # devices
>>> device 0 osd.0
>>> device 1 osd.1
>>> device 2 osd.2
>>> device 3 osd.3
>>> device 4 osd.4
>>> device 5 osd.5
>>> device 6 osd.6
>>> device 7 osd.7
>>> device 8 osd.8
>>> device 9 osd.9
>>> device 10 osd.10
>>> device 11 osd.11
>>> device 12 osd.12
>>> device 13 osd.13
>>> device 14 osd.14
>>> device 15 osd.15
>>> device 16 osd.16
>>> device 17 osd.17
>>> device 18 osd.18
>>> device 19 osd.19
>>> device 20 osd.20
>>> device 21 osd.21
>>> device 22 osd.22
>>> device 23 osd.23
>>> device 24 osd.24
>>> device 25 osd.25
>>> device 26 osd.26
>>> device 27 osd.27
>>> device 28 osd.28
>>> device 29 osd.29
>>> device 30 osd.30
>>> device 31 osd.31
>>> device 32 osd.32
>>> device 33 osd.33
>>> device 34 osd.34
>>> device 35 osd.35
>>> device 36 osd.36
>>> device 37 osd.37
>>> device 38 osd.38
>>> device 39 osd.39
>>> device 40 osd.40
>>> device 41 osd.41
>>> device 42 osd.42
>>> device 43 osd.43
>>> device 44 osd.44
>>> device 45 osd.45
>>> device 46 osd.46
>>> device 47 osd.47
>>> device 48 osd.48
>>> device 49 osd.49
>>> device 50 osd.50
>>> device 51 osd.51
>>> device 52 osd.52
>>> device 53 osd.53
>>> device 54 osd.54
>>> device 55 osd.55
>>> device 56 osd.56
>>> device 57 osd.57
>>> device 58 osd.58
>>> device 59 osd.59
>>> device 60 osd.60
>>> device 61 osd.61
>>> device 62 osd.62
>>> device 63 osd.63
>>> device 64 osd.64
>>> device 65 osd.65
>>> device 66 osd.66
>>> device 67 osd.67
>>> device 68 osd.68
>>> device 69 osd.69
>>> device 70 osd.70
>>> device 71 osd.71
>>> device 72 osd.72
>>> device 73 osd.73
>>> device 74 osd.74
>>> device 75 osd.75
>>> device 76 osd.76
>>> device 77 osd.77
>>> device 78 osd.78
>>> device 79 osd.79
>>> device 80 osd.80
>>> device 81 osd.81
>>> device 82 osd.82
>>> device 83 osd.83
>>> device 84 osd.84
>>> device 85 osd.85
>>> device 86 osd.86
>>> device 87 osd.87
>>> device 88 osd.88
>>> device 89 osd.89
>>> device 90 osd.90
>>> device 91 osd.91
>>> device 92 osd.92
>>> device 93 osd.93
>>> device 94 osd.94
>>> device 95 osd.95
>>> device 96 osd.96
>>> device 97 osd.97
>>> device 98 osd.98
>>> device 99 osd.99
>>> device 100 osd.100
>>> device 101 osd.101
>>> device 102 osd.102
>>> device 103 osd.103
>>> device 104 osd.104
>>> device 105 osd.105
>>>
>>> # types
>>> type 0 osd
>>> type 1 host
>>> type 2 rack
>>> type 3 row
>>> type 4 room
>>> type 5 datacenter
>>> type 6 pool
>>>
>>> # buckets
>>> host gstore5 {
>>> id -9 # do not change unnecessarily
>>> # weight 5.000
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.2 weight 5.000
>>> }
>>> host cstore1 {
>>> id -10 # do not change unnecessarily
>>> # weight 16.000
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.5 weight 1.000
>>> item osd.6 weight 1.000
>>> item osd.7 weight 1.000
>>> item osd.8 weight 1.000
>>> item osd.9 weight 1.000
>>> item osd.10 weight 1.000
>>> item osd.11 weight 1.000
>>> item osd.12 weight 1.000
>>> item osd.13 weight 1.000
>>> item osd.14 weight 1.000
>>> item osd.4 weight 1.000
>>> item osd.47 weight 1.000
>>> item osd.48 weight 1.000
>>> item osd.49 weight 1.000
>>> item osd.50 weight 1.000
>>> item osd.51 weight 1.000
>>> }
>>> host cstore2 {
>>> id -11 # do not change unnecessarily
>>> # weight 20.000
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.15 weight 1.000
>>> item osd.16 weight 1.000
>>> item osd.17 weight 1.000
>>> item osd.18 weight 1.000
>>> item osd.19 weight 1.000
>>> item osd.20 weight 1.000
>>> item osd.21 weight 1.000
>>> item osd.22 weight 1.000
>>> item osd.23 weight 1.000
>>> item osd.24 weight 1.000
>>> item osd.41 weight 1.000
>>> item osd.42 weight 1.000
>>> item osd.43 weight 1.000
>>> item osd.44 weight 1.000
>>> item osd.45 weight 1.000
>>> item osd.46 weight 1.000
>>> item osd.52 weight 1.000
>>> item osd.53 weight 1.000
>>> item osd.54 weight 1.000
>>> item osd.55 weight 1.000
>>> item osd.56 weight 0.000
>>> item osd.57 weight 0.000
>>> }
>>> host cstore3 {
>>> id -12 # do not change unnecessarily
>>> # weight 16.000
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.25 weight 1.000
>>> item osd.26 weight 1.000
>>> item osd.27 weight 1.000
>>> item osd.28 weight 1.000
>>> item osd.29 weight 1.000
>>> item osd.30 weight 1.000
>>> item osd.31 weight 1.000
>>> item osd.32 weight 1.000
>>> item osd.33 weight 1.000
>>> item osd.34 weight 1.000
>>> item osd.35 weight 1.000
>>> item osd.36 weight 1.000
>>> item osd.37 weight 1.000
>>> item osd.38 weight 1.000
>>> item osd.39 weight 1.000
>>> item osd.40 weight 1.000
>>> }
>>> host cstore4 {
>>> id -13 # do not change unnecessarily
>>> # weight 7.640
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.62 weight 0.550
>>> item osd.63 weight 0.550
>>> item osd.64 weight 0.550
>>> item osd.65 weight 0.550
>>> item osd.66 weight 0.550
>>> item osd.67 weight 0.550
>>> item osd.68 weight 0.550
>>> item osd.69 weight 0.550
>>> item osd.70 weight 0.270
>>> item osd.71 weight 0.270
>>> item osd.72 weight 0.270
>>> item osd.73 weight 0.270
>>> item osd.74 weight 0.270
>>> item osd.75 weight 0.270
>>> item osd.76 weight 0.270
>>> item osd.77 weight 0.270
>>> item osd.78 weight 0.270
>>> item osd.79 weight 0.270
>>> item osd.80 weight 0.270
>>> item osd.81 weight 0.270
>>> }
>>> host cstore5 {
>>> id -14 # do not change unnecessarily
>>> # weight 3.320
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.0 weight 0.400
>>> item osd.1 weight 0.500
>>> item osd.3 weight 0.400
>>> item osd.58 weight 0.400
>>> item osd.59 weight 0.540
>>> item osd.60 weight 0.540
>>> item osd.61 weight 0.540
>>> }
>>> rack unknownrack {
>>> id -3 # do not change unnecessarily
>>> # weight 67.960
>>> alg straw
>>> hash 0 # rjenkins1
>>> item gstore5 weight 5.000
>>> item cstore1 weight 16.000
>>> item cstore2 weight 20.000
>>> item cstore3 weight 16.000
>>> item cstore4 weight 7.640
>>> item cstore5 weight 3.320
>>> }
>>> pool default {
>>> id -1 # do not change unnecessarily
>>> # weight 67.960
>>> alg straw
>>> hash 0 # rjenkins1
>>> item unknownrack weight 67.960
>>> }
>>> host gstore1 {
>>> id -6 # do not change unnecessarily
>>> # weight 32.760
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.82 weight 2.730
>>> item osd.83 weight 2.730
>>> item osd.84 weight 2.730
>>> item osd.85 weight 2.730
>>> item osd.86 weight 2.730
>>> item osd.87 weight 2.730
>>> item osd.88 weight 2.730
>>> item osd.89 weight 2.730
>>> item osd.90 weight 2.730
>>> item osd.91 weight 2.730
>>> item osd.92 weight 2.730
>>> item osd.93 weight 2.730
>>> }
>>> datacenter datacenter-cod {
>>> id -4 # do not change unnecessarily
>>> # weight 32.760
>>> alg straw
>>> hash 0 # rjenkins1
>>> item gstore1 weight 32.760
>>> }
>>> host gstore2 {
>>> id -7 # do not change unnecessarily
>>> # weight 32.760
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.94 weight 2.730
>>> item osd.95 weight 2.730
>>> item osd.96 weight 2.730
>>> item osd.97 weight 2.730
>>> item osd.98 weight 2.730
>>> item osd.99 weight 2.730
>>> item osd.100 weight 2.730
>>> item osd.101 weight 2.730
>>> item osd.102 weight 2.730
>>> item osd.103 weight 2.730
>>> item osd.104 weight 2.730
>>> item osd.105 weight 2.730
>>> }
>>> datacenter datacenter-rcod {
>>> id -5 # do not change unnecessarily
>>> # weight 32.760
>>> alg straw
>>> hash 0 # rjenkins1
>>> item gstore2 weight 32.760
>>> }
>>> pool iscsi {
>>> id -2 # do not change unnecessarily
>>> # weight 65.520
>>> alg straw
>>> hash 0 # rjenkins1
>>> item datacenter-cod weight 32.760
>>> item datacenter-rcod weight 32.760
>>> }
>>>
>>> # rules
>>> rule data {
>>> ruleset 0
>>> type replicated
>>> min_size 1
>>> max_size 10
>>> step take default
>>> step chooseleaf firstn 0 type host
>>> step emit
>>> }
>>> rule metadata {
>>> ruleset 1
>>> type replicated
>>> min_size 1
>>> max_size 10
>>> step take default
>>> step chooseleaf firstn 0 type host
>>> step emit
>>> }
>>> rule rbd {
>>> ruleset 2
>>> type replicated
>>> min_size 1
>>> max_size 10
>>> step take default
>>> step chooseleaf firstn 0 type host
>>> step emit
>>> }
>>>
>>> # end crush map
>>>
>>>
>>> Restarting mon don't help.
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux