Re: CRUSH Rule Review - Not replicating correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Not that I know of.
- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Mon, Jan 18, 2016 at 10:33 AM, deeepdish  wrote:
> Thanks Robert.   Will definitely try this.   Is there a way to implement “gradual CRUSH” changes?   I noticed whenever cluster wide changes are pushed (crush map, for instance) the cluster immediately attempts to align itself disrupting client access / performance…
>
>
>> On Jan 18, 2016, at 12:22 , Robert LeBlanc  wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> I'm not sure why you have six monitors. Six monitors buys you nothing
>> over five monitors other than more power being used, and more latency
>> and more headache. See
>> http://docs.ceph.com/docs/hammer/rados/configuration/mon-config-ref/#monitor-quorum
>> for some more info. Also, I'd consider 5 monitors overkill for this
>> size cluster, I'd recommend three.
>>
>> Although this is most likely not the root cause of your problem, you
>> probably have an error here: "root replicated-T1" is pointing to
>> b02s08 and b02s12 and "site erbus" is also pointing to b02s08 and
>> b02s12. You probably meant to have "root replicated-T1" pointing to
>> erbus instead.
>>
>> Where I think your problem is, is in your "rule replicated" section.
>> You can try:
>> step take replicated-T1
>> step choose firstn 2 type host
>> step chooseleaf firstn 2 type osdgroup
>> step emit
>>
>> What this does is choose two hosts from the root replicated-T1 (which
>> happens to be both hosts you have), then chooses an OSD from two
>> osdgroups on each host.
>>
>> I believe the problem with your current rule set is that firstn 0 type
>> host tries to select four hosts, but only two are available. You
>> should be able to see that with 'ceph pg dump', where only two osds
>> will be listed in the up set.
>>
>> I hope that helps.
>> -----BEGIN PGP SIGNATURE-----
>> Version: Mailvelope v1.3.3
>> Comment: https://www.mailvelope.com
>>
>> wsFcBAEBCAAQBQJWnR9kCRDmVDuy+mK58QAA5hUP/iJprG4nGR2sJvL//8l+
>> V6oLYXTCs8lHeKL3ZPagThE9oh2xDMV37WR3I/xMNTA8735grl8/AAhy8ypW
>> MDOikbpzfWnlaL0SWs5rIQ5umATwv73Fg/Mf+K2Olt8IGP6D0NMIxfeOjU6E
>> 0Sc3F37nDQFuDEkBYjcVcqZC89PByh7yaId+eOgr7Ot+BZL/3fbpWIZ9kyD5
>> KoPYdPjtFruoIpc8DJydzbWdmha65DkB65QOZlI3F3lMc6LGXUopm4OP4sQd
>> txVKFtTcLh97WgUshQMSWIiJiQT7+3D6EqQyPzlnei3O3gACpkpsmUteDPpn
>> p8CDeJtIpgKnQZjBwfK/bUQXdIGem8Y0x/PC+1ekIhkHCIJeW2sD3mFJduDQ
>> 9loQ9+IsWHfQmEHLMLdeNzRXbgBY2djxP2X70fXTg31fx+dYvbWeulYJHiKi
>> 1fJS4GdbPjoRUp5k4lthk3hDTFD/f5ZuowLDIaexgISb0bIJcObEn9RWlHut
>> IRVi0fUuRVIX3snGMOKjLmSUe87Od2KSEbULYPTLYDMo/FsWXWHNlP3gVKKd
>> lQJdxcwXOW7/v5oayY4wiEE6NF4rCupcqt0nPxxmbehmeRPxgkWCKJJs3FNr
>> VmUdnrdpfxzR5c8dmOELJnpNS6MTT56B8A4kKmqbbHCEKpZ83piG7uwqc+6f
>> RKkQ
>> =gp/0
>> -----END PGP SIGNATURE-----
>> ----------------
>> Robert LeBlanc
>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>>
>>
>> On Sun, Jan 17, 2016 at 6:31 PM, deeepdish  wrote:
>>> Hi Everyone,
>>>
>>> Looking for a double check of my logic and crush map..
>>>
>>> Overview:
>>>
>>> - osdgroup bucket type defines failure domain within a host of 5 OSDs + 1
>>> SSD.   Therefore 5 OSDs (all utilizing the same journal) constitute an
>>> osdgroup bucket.   Each host has 4 osdgroups.
>>> - 6 monitors
>>> - Two node cluster
>>> - Each node:
>>> - 20 OSDs
>>> -  4 SSDs
>>> - 4 osdgroups
>>>
>>> Desired Crush Rule outcome:
>>> - Assuming a pool with min_size=2 and size=4, all each node would contain a
>>> redundant copy of each object.   Should any of the hosts fail, access to
>>> data would be uninterrupted.
>>>
>>> Current Crush Rule outcome:
>>> - There are 4 copies of each object, however I don’t believe each node has a
>>> redundant copy of each object, when a node fails, data is NOT accessible
>>> until ceph rebuilds itself / node becomes accessible again.
>>>
>>> I susepct my crush is not right, and to remedy it may take some time and
>>> cause cluster to be unresponsive / unavailable.    Is there a way / method
>>> to apply substantial crush changes gradually to a cluster?
>>>
>>> Thanks for your help.
>>>
>>>
>>> Current crush map:
>>>
>>> # begin crush map
>>> tunable choose_local_tries 0
>>> tunable choose_local_fallback_tries 0
>>> tunable choose_total_tries 50
>>> tunable chooseleaf_descend_once 1
>>> tunable straw_calc_version 1
>>>
>>> # devices
>>> device 0 osd.0
>>> device 1 osd.1
>>> device 2 osd.2
>>> device 3 osd.3
>>> device 4 osd.4
>>> device 5 osd.5
>>> device 6 osd.6
>>> device 7 osd.7
>>> device 8 osd.8
>>> device 9 osd.9
>>> device 10 osd.10
>>> device 11 osd.11
>>> device 12 osd.12
>>> device 13 osd.13
>>> device 14 osd.14
>>> device 15 osd.15
>>> device 16 osd.16
>>> device 17 osd.17
>>> device 18 osd.18
>>> device 19 osd.19
>>> device 20 osd.20
>>> device 21 osd.21
>>> device 22 osd.22
>>> device 23 osd.23
>>> device 24 osd.24
>>> device 25 osd.25
>>> device 26 osd.26
>>> device 27 osd.27
>>> device 28 osd.28
>>> device 29 osd.29
>>> device 30 osd.30
>>> device 31 osd.31
>>> device 32 osd.32
>>> device 33 osd.33
>>> device 34 osd.34
>>> device 35 osd.35
>>> device 36 osd.36
>>> device 37 osd.37
>>> device 38 osd.38
>>> device 39 osd.39
>>>
>>> # types
>>> type 0 osd
>>> type 1 osdgroup
>>> type 2 host
>>> type 3 rack
>>> type 4 site
>>> type 5 root
>>>
>>> # buckets
>>> osdgroup b02s08-osdgroupA {
>>> id -81 # do not change unnecessarily
>>> # weight 18.100
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.0 weight 3.620
>>> item osd.1 weight 3.620
>>> item osd.2 weight 3.620
>>> item osd.3 weight 3.620
>>> item osd.4 weight 3.620
>>> }
>>> osdgroup b02s08-osdgroupB {
>>> id -82 # do not change unnecessarily
>>> # weight 18.100
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.5 weight 3.620
>>> item osd.6 weight 3.620
>>> item osd.7 weight 3.620
>>> item osd.8 weight 3.620
>>> item osd.9 weight 3.620
>>> }
>>> osdgroup b02s08-osdgroupC {
>>> id -83 # do not change unnecessarily
>>> # weight 19.920
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.10 weight 3.620
>>> item osd.11 weight 3.620
>>> item osd.12 weight 3.620
>>> item osd.13 weight 3.620
>>> item osd.14 weight 5.440
>>> }
>>> osdgroup b02s08-osdgroupD {
>>> id -84 # do not change unnecessarily
>>> # weight 19.920
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.15 weight 3.620
>>> item osd.16 weight 3.620
>>> item osd.17 weight 3.620
>>> item osd.18 weight 3.620
>>> item osd.19 weight 5.440
>>> }
>>> host b02s08 {
>>> id -80 # do not change unnecessarily
>>> # weight 76.040
>>> alg straw
>>> hash 0 # rjenkins1
>>> item b02s08-osdgroupA weight 18.100
>>> item b02s08-osdgroupB weight 18.100
>>> item b02s08-osdgroupC weight 19.920
>>> item b02s08-osdgroupD weight 19.920
>>> }
>>> osdgroup b02s12-osdgroupA {
>>> id -121 # do not change unnecessarily
>>> # weight 18.100
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.20 weight 3.620
>>> item osd.21 weight 3.620
>>> item osd.22 weight 3.620
>>> item osd.23 weight 3.620
>>> item osd.24 weight 3.620
>>> }
>>> osdgroup b02s12-osdgroupB {
>>> id -122 # do not change unnecessarily
>>> # weight 18.100
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.25 weight 3.620
>>> item osd.26 weight 3.620
>>> item osd.27 weight 3.620
>>> item osd.28 weight 3.620
>>> item osd.29 weight 3.620
>>> }
>>> osdgroup b02s12-osdgroupC {
>>> id -123 # do not change unnecessarily
>>> # weight 19.920
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.30 weight 3.620
>>> item osd.31 weight 3.620
>>> item osd.32 weight 3.620
>>> item osd.33 weight 3.620
>>> item osd.34 weight 5.440
>>> }
>>> osdgroup b02s12-osdgroupD {
>>> id -124 # do not change unnecessarily
>>> # weight 19.920
>>> alg straw
>>> hash 0 # rjenkins1
>>> item osd.35 weight 3.620
>>> item osd.36 weight 3.620
>>> item osd.37 weight 3.620
>>> item osd.38 weight 3.620
>>> item osd.39 weight 5.440
>>> }
>>> host b02s12 {
>>> id -120 # do not change unnecessarily
>>> # weight 76.040
>>> alg straw
>>> hash 0 # rjenkins1
>>> item b02s12-osdgroupA weight 18.100
>>> item b02s12-osdgroupB weight 18.100
>>> item b02s12-osdgroupC weight 19.920
>>> item b02s12-osdgroupD weight 19.920
>>> }
>>> root replicated-T1 {
>>> id -1 # do not change unnecessarily
>>> # weight 152.080
>>> alg straw
>>> hash 0 # rjenkins1
>>> item b02s08 weight 76.040
>>> item b02s12 weight 76.040
>>> }
>>> rack b02 {
>>> id -20 # do not change unnecessarily
>>> # weight 152.080
>>> alg straw
>>> hash 0 # rjenkins1
>>> item b02s08 weight 76.040
>>> item b02s12 weight 76.040
>>> }
>>> site erbus {
>>> id -10 # do not change unnecessarily
>>> # weight 152.080
>>> alg straw
>>> hash 0 # rjenkins1
>>> item b02 weight 152.080
>>> }
>>>
>>> # rules
>>> rule replicated {
>>> ruleset 0
>>> type replicated
>>> min_size 1
>>> max_size 10
>>> step take replicated-T1
>>> step choose firstn 0 type host
>>> step chooseleaf firstn 0 type osdgroup
>>> step emit
>>> }
>>>
>>> # end crush map
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>

-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.3.3
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJWnSKNCRDmVDuy+mK58QAATgIQAIHVBvSoQ2pQ6/J/+KI6
5TfjqAhJ3Q7E9JwC0suZ9JRORhBcrbab5wnY4oMabLeAEazTST5gAedeMBV4
vFA1aG5RBpUcir1+49BZYpHuUuJuvviTSrVjojbr6eISvsJfFwq7BosQZw7h
DxExk8Pm5l8cDXd2z03f34F7xDfX3u0UsLm/TCTfzxFAmwngkC6rsElJSXp+
MQ9mjBIncx+HkDWyshJjKBqhhXVfOa+euUuCcmTlIiGgIaA5PXNG+q+OJMnq
0ONb9TF51ApW3NvIgRMKo94g+rw7wSEJbe7LkzJOgkJ19rrLrit3uhCOLDha
iF2ELgd9jNRbsODUd0iTU9DgecoWuCZMsCdpeYyoN+BO1OLAdNjgQH9JaHnx
JeIT538/x8gSi8S7We0FjdPvY0dbIMneROocK8/e+byboindodfV0z2YJ4C3
kEkweIW+45PHjQWLPU6SVYtdZyHoUPOrCpEOTo/9uOILx8nmvcY2SEhyuiFd
6QfZKKCwPmhNDNB+UUPzrd8cp794RDp9bYue3Ql5L1K4Yln0nXEvzVoNp8eB
JXjsFXkPMBB8njiS7E4e7CGc64azVlagGZ+H99jbCLVdyaTrT+9+/WGwl1Ut
OzDhwuU/dPYodT2ULPYtrMN03LoKozd2MKh0wjebwONJOUgMxUvGLcffSmMb
/TJ6
=tLy3
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux