Peter, hi. thanks for the reply, let me check that out, and get back to you On Wed, Jun 7, 2017 at 4:13 AM, Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote: > On 06/06/17 19:23, Alejandro Comisario wrote: >> Hi all, i have a multi datacenter 6 nodes (6 osd) ceph jewel cluster. >> There are 3 pools in the cluster, all three with size 3 and min_size 2. >> >> Today, i shut down all three nodes (controlled and in order) on >> datacenter "CPD2" just to validate that everything keeps working on >> "CPD1", whitch did (including rebalance of the infromation). >> >> After everything was off on CPD2, the "osd tree" looks like this, >> whitch seems ok. >> >> root@oskceph01:~# ceph osd tree >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 30.00000 root default >> -8 15.00000 datacenter CPD1 >> -2 5.00000 host oskceph01 >> 0 5.00000 osd.0 up 1.00000 1.00000 >> -6 5.00000 host oskceph05 >> 4 5.00000 osd.4 up 1.00000 1.00000 >> -4 5.00000 host oskceph03 >> 2 5.00000 osd.2 up 1.00000 1.00000 >> -9 15.00000 datacenter CPD2 >> -3 5.00000 host oskceph02 >> 1 5.00000 osd.1 down 0 1.00000 >> -5 5.00000 host oskceph04 >> 3 5.00000 osd.3 down 0 1.00000 >> -7 5.00000 host oskceph06 >> 5 5.00000 osd.5 down 0 1.00000 >> >> ... >> >> root@oskceph01:~# ceph pg dump | egrep degrad >> dumped all in format plain >> 8.1b3 178 0 178 0 0 1786814 3078 3078 active+undersized+degraded >> 2017-06-06 13:11:46.130567 2361'250952 2361:248472 [0,2] 0 [0,2] 0 >> 1889'249956 2017-06-06 04:11:52.736214 1889'242115 2017-06-03 >> 19:07:06.615674 >> >> For some extrange reason, i see that the acting set is [0,2] i dont >> see osd.4 on the acting set, and honestly, i dont know why. >> >> ... > I'm assuming you have failure domain as host, not datacenter? (otherwise > you'd never get 0,2 ... and size 3 could never work either) > > So then it looks like a problem I had and solved this week... I had 60 > osds with 19 down to be replaced, and one pg out of 1152 wouldn't peer. > Randomly I realized what was wrong... there's a "tunable > choose_total_tries" you can increase so the pgs that tried to find an > osd that many times and failed will try more: > >> ceph osd getcrushmap -o crushmap >> crushtool -d crushmap -o crushmap.txt >> vim crushmap.txt >> here you change tunable choose_total_tries higher... default is >> 50. 100 worked for me the first time, and then later I changed it >> again to 200. >> crushtool -c crushmap.txt -o crushmap.new >> ceph osd setcrushmap -i crushmap.new > > if anything goes wrong with the new crushmap, you can always set the old > again: >> ceph osd setcrushmap -i crushmap > > Then you have to wait some time, maybe 30s before you have pgs peering. > > Now if only there was a log or warning seen in ceph -s that said the > tries was exceeded, then this solution would be more obvious (and we > would know whether it applies to you).... > -- Alejandro Comisario CTO | NUBELIU E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857 _ www.nubeliu.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com