Thanks Robert. Will definitely try this. Is there a way to implement “gradual CRUSH” changes? I noticed whenever cluster wide changes are pushed (crush map, for instance) the cluster immediately attempts to align itself disrupting client access / performance… > On Jan 18, 2016, at 12:22 , Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > I'm not sure why you have six monitors. Six monitors buys you nothing > over five monitors other than more power being used, and more latency > and more headache. See > http://docs.ceph.com/docs/hammer/rados/configuration/mon-config-ref/#monitor-quorum > for some more info. Also, I'd consider 5 monitors overkill for this > size cluster, I'd recommend three. > > Although this is most likely not the root cause of your problem, you > probably have an error here: "root replicated-T1" is pointing to > b02s08 and b02s12 and "site erbus" is also pointing to b02s08 and > b02s12. You probably meant to have "root replicated-T1" pointing to > erbus instead. > > Where I think your problem is, is in your "rule replicated" section. > You can try: > step take replicated-T1 > step choose firstn 2 type host > step chooseleaf firstn 2 type osdgroup > step emit > > What this does is choose two hosts from the root replicated-T1 (which > happens to be both hosts you have), then chooses an OSD from two > osdgroups on each host. > > I believe the problem with your current rule set is that firstn 0 type > host tries to select four hosts, but only two are available. You > should be able to see that with 'ceph pg dump', where only two osds > will be listed in the up set. > > I hope that helps. > -----BEGIN PGP SIGNATURE----- > Version: Mailvelope v1.3.3 > Comment: https://www.mailvelope.com > > wsFcBAEBCAAQBQJWnR9kCRDmVDuy+mK58QAA5hUP/iJprG4nGR2sJvL//8l+ > V6oLYXTCs8lHeKL3ZPagThE9oh2xDMV37WR3I/xMNTA8735grl8/AAhy8ypW > MDOikbpzfWnlaL0SWs5rIQ5umATwv73Fg/Mf+K2Olt8IGP6D0NMIxfeOjU6E > 0Sc3F37nDQFuDEkBYjcVcqZC89PByh7yaId+eOgr7Ot+BZL/3fbpWIZ9kyD5 > KoPYdPjtFruoIpc8DJydzbWdmha65DkB65QOZlI3F3lMc6LGXUopm4OP4sQd > txVKFtTcLh97WgUshQMSWIiJiQT7+3D6EqQyPzlnei3O3gACpkpsmUteDPpn > p8CDeJtIpgKnQZjBwfK/bUQXdIGem8Y0x/PC+1ekIhkHCIJeW2sD3mFJduDQ > 9loQ9+IsWHfQmEHLMLdeNzRXbgBY2djxP2X70fXTg31fx+dYvbWeulYJHiKi > 1fJS4GdbPjoRUp5k4lthk3hDTFD/f5ZuowLDIaexgISb0bIJcObEn9RWlHut > IRVi0fUuRVIX3snGMOKjLmSUe87Od2KSEbULYPTLYDMo/FsWXWHNlP3gVKKd > lQJdxcwXOW7/v5oayY4wiEE6NF4rCupcqt0nPxxmbehmeRPxgkWCKJJs3FNr > VmUdnrdpfxzR5c8dmOELJnpNS6MTT56B8A4kKmqbbHCEKpZ83piG7uwqc+6f > RKkQ > =gp/0 > -----END PGP SIGNATURE----- > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Sun, Jan 17, 2016 at 6:31 PM, deeepdish <deeepdish@xxxxxxxxx> wrote: >> Hi Everyone, >> >> Looking for a double check of my logic and crush map.. >> >> Overview: >> >> - osdgroup bucket type defines failure domain within a host of 5 OSDs + 1 >> SSD. Therefore 5 OSDs (all utilizing the same journal) constitute an >> osdgroup bucket. Each host has 4 osdgroups. >> - 6 monitors >> - Two node cluster >> - Each node: >> - 20 OSDs >> - 4 SSDs >> - 4 osdgroups >> >> Desired Crush Rule outcome: >> - Assuming a pool with min_size=2 and size=4, all each node would contain a >> redundant copy of each object. Should any of the hosts fail, access to >> data would be uninterrupted. >> >> Current Crush Rule outcome: >> - There are 4 copies of each object, however I don’t believe each node has a >> redundant copy of each object, when a node fails, data is NOT accessible >> until ceph rebuilds itself / node becomes accessible again. >> >> I susepct my crush is not right, and to remedy it may take some time and >> cause cluster to be unresponsive / unavailable. Is there a way / method >> to apply substantial crush changes gradually to a cluster? >> >> Thanks for your help. >> >> >> Current crush map: >> >> # begin crush map >> tunable choose_local_tries 0 >> tunable choose_local_fallback_tries 0 >> tunable choose_total_tries 50 >> tunable chooseleaf_descend_once 1 >> tunable straw_calc_version 1 >> >> # devices >> device 0 osd.0 >> device 1 osd.1 >> device 2 osd.2 >> device 3 osd.3 >> device 4 osd.4 >> device 5 osd.5 >> device 6 osd.6 >> device 7 osd.7 >> device 8 osd.8 >> device 9 osd.9 >> device 10 osd.10 >> device 11 osd.11 >> device 12 osd.12 >> device 13 osd.13 >> device 14 osd.14 >> device 15 osd.15 >> device 16 osd.16 >> device 17 osd.17 >> device 18 osd.18 >> device 19 osd.19 >> device 20 osd.20 >> device 21 osd.21 >> device 22 osd.22 >> device 23 osd.23 >> device 24 osd.24 >> device 25 osd.25 >> device 26 osd.26 >> device 27 osd.27 >> device 28 osd.28 >> device 29 osd.29 >> device 30 osd.30 >> device 31 osd.31 >> device 32 osd.32 >> device 33 osd.33 >> device 34 osd.34 >> device 35 osd.35 >> device 36 osd.36 >> device 37 osd.37 >> device 38 osd.38 >> device 39 osd.39 >> >> # types >> type 0 osd >> type 1 osdgroup >> type 2 host >> type 3 rack >> type 4 site >> type 5 root >> >> # buckets >> osdgroup b02s08-osdgroupA { >> id -81 # do not change unnecessarily >> # weight 18.100 >> alg straw >> hash 0 # rjenkins1 >> item osd.0 weight 3.620 >> item osd.1 weight 3.620 >> item osd.2 weight 3.620 >> item osd.3 weight 3.620 >> item osd.4 weight 3.620 >> } >> osdgroup b02s08-osdgroupB { >> id -82 # do not change unnecessarily >> # weight 18.100 >> alg straw >> hash 0 # rjenkins1 >> item osd.5 weight 3.620 >> item osd.6 weight 3.620 >> item osd.7 weight 3.620 >> item osd.8 weight 3.620 >> item osd.9 weight 3.620 >> } >> osdgroup b02s08-osdgroupC { >> id -83 # do not change unnecessarily >> # weight 19.920 >> alg straw >> hash 0 # rjenkins1 >> item osd.10 weight 3.620 >> item osd.11 weight 3.620 >> item osd.12 weight 3.620 >> item osd.13 weight 3.620 >> item osd.14 weight 5.440 >> } >> osdgroup b02s08-osdgroupD { >> id -84 # do not change unnecessarily >> # weight 19.920 >> alg straw >> hash 0 # rjenkins1 >> item osd.15 weight 3.620 >> item osd.16 weight 3.620 >> item osd.17 weight 3.620 >> item osd.18 weight 3.620 >> item osd.19 weight 5.440 >> } >> host b02s08 { >> id -80 # do not change unnecessarily >> # weight 76.040 >> alg straw >> hash 0 # rjenkins1 >> item b02s08-osdgroupA weight 18.100 >> item b02s08-osdgroupB weight 18.100 >> item b02s08-osdgroupC weight 19.920 >> item b02s08-osdgroupD weight 19.920 >> } >> osdgroup b02s12-osdgroupA { >> id -121 # do not change unnecessarily >> # weight 18.100 >> alg straw >> hash 0 # rjenkins1 >> item osd.20 weight 3.620 >> item osd.21 weight 3.620 >> item osd.22 weight 3.620 >> item osd.23 weight 3.620 >> item osd.24 weight 3.620 >> } >> osdgroup b02s12-osdgroupB { >> id -122 # do not change unnecessarily >> # weight 18.100 >> alg straw >> hash 0 # rjenkins1 >> item osd.25 weight 3.620 >> item osd.26 weight 3.620 >> item osd.27 weight 3.620 >> item osd.28 weight 3.620 >> item osd.29 weight 3.620 >> } >> osdgroup b02s12-osdgroupC { >> id -123 # do not change unnecessarily >> # weight 19.920 >> alg straw >> hash 0 # rjenkins1 >> item osd.30 weight 3.620 >> item osd.31 weight 3.620 >> item osd.32 weight 3.620 >> item osd.33 weight 3.620 >> item osd.34 weight 5.440 >> } >> osdgroup b02s12-osdgroupD { >> id -124 # do not change unnecessarily >> # weight 19.920 >> alg straw >> hash 0 # rjenkins1 >> item osd.35 weight 3.620 >> item osd.36 weight 3.620 >> item osd.37 weight 3.620 >> item osd.38 weight 3.620 >> item osd.39 weight 5.440 >> } >> host b02s12 { >> id -120 # do not change unnecessarily >> # weight 76.040 >> alg straw >> hash 0 # rjenkins1 >> item b02s12-osdgroupA weight 18.100 >> item b02s12-osdgroupB weight 18.100 >> item b02s12-osdgroupC weight 19.920 >> item b02s12-osdgroupD weight 19.920 >> } >> root replicated-T1 { >> id -1 # do not change unnecessarily >> # weight 152.080 >> alg straw >> hash 0 # rjenkins1 >> item b02s08 weight 76.040 >> item b02s12 weight 76.040 >> } >> rack b02 { >> id -20 # do not change unnecessarily >> # weight 152.080 >> alg straw >> hash 0 # rjenkins1 >> item b02s08 weight 76.040 >> item b02s12 weight 76.040 >> } >> site erbus { >> id -10 # do not change unnecessarily >> # weight 152.080 >> alg straw >> hash 0 # rjenkins1 >> item b02 weight 152.080 >> } >> >> # rules >> rule replicated { >> ruleset 0 >> type replicated >> min_size 1 >> max_size 10 >> step take replicated-T1 >> step choose firstn 0 type host >> step chooseleaf firstn 0 type osdgroup >> step emit >> } >> >> # end crush map >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com