Re: PG that should not be on undersized+degraded on multi datacenter Ceph cluster

Alejandro Comisario <alejandro@xxxxxxxxxxx> · Wed, 7 Jun 2017 17:13:01 -0300

Peter, hi ... what happened to me is exactly what happened to you,
thanks so much for pointing that out!

I'm amazed on how you realized that was the problem !!
Maybe that will help me troubleshoot a little more pro.

best.

On Wed, Jun 7, 2017 at 5:06 PM, Alejandro Comisario
<alejandro@xxxxxxxxxxx> wrote:
> Peter, hi.
> thanks for the reply, let me check that out, and get back to you
>
> On Wed, Jun 7, 2017 at 4:13 AM, Peter Maloney
> <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:
>> On 06/06/17 19:23, Alejandro Comisario wrote:
>>> Hi all, i have a multi datacenter 6 nodes (6 osd) ceph jewel cluster.
>>> There are 3 pools in the cluster, all three with size 3 and min_size 2.
>>>
>>> Today, i shut down all three nodes (controlled and in order) on
>>> datacenter "CPD2" just to validate that everything keeps working on
>>> "CPD1", whitch did (including rebalance of the infromation).
>>>
>>> After everything was off on CPD2, the "osd tree" looks like this,
>>> whitch seems ok.
>>>
>>> root@oskceph01:~# ceph osd tree
>>> ID WEIGHT   TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY
>>> -1 30.00000 root default
>>> -8 15.00000     datacenter CPD1
>>> -2  5.00000         host oskceph01
>>>  0  5.00000             osd.0           up  1.00000          1.00000
>>> -6  5.00000         host oskceph05
>>>  4  5.00000             osd.4           up  1.00000          1.00000
>>> -4  5.00000         host oskceph03
>>>  2  5.00000             osd.2           up  1.00000          1.00000
>>> -9 15.00000     datacenter CPD2
>>> -3  5.00000         host oskceph02
>>>  1  5.00000             osd.1         down        0          1.00000
>>> -5  5.00000         host oskceph04
>>>  3  5.00000             osd.3         down        0          1.00000
>>> -7  5.00000         host oskceph06
>>>  5  5.00000             osd.5         down        0          1.00000
>>>
>>> ...
>>>
>>> root@oskceph01:~# ceph pg dump | egrep degrad
>>> dumped all in format plain
>>> 8.1b3 178 0 178 0 0 1786814 3078 3078 active+undersized+degraded
>>> 2017-06-06 13:11:46.130567 2361'250952 2361:248472 [0,2] 0 [0,2] 0
>>> 1889'249956 2017-06-06 04:11:52.736214 1889'242115 2017-06-03
>>> 19:07:06.615674
>>>
>>> For some extrange reason, i see that the acting set is [0,2] i dont
>>> see osd.4 on the acting set, and honestly, i dont know why.
>>>
>>> ...
>> I'm assuming you have failure domain as host, not datacenter? (otherwise
>> you'd never get 0,2 ... and size 3 could never work either)
>>
>> So then it looks like a problem I had and solved this week... I had 60
>> osds with 19 down to be replaced, and one pg out of 1152 wouldn't peer.
>> Randomly I realized what was wrong... there's a "tunable
>> choose_total_tries" you can increase so the pgs that tried to find an
>> osd that many times and failed will try more:
>>
>>> ceph osd getcrushmap -o crushmap
>>> crushtool -d crushmap -o crushmap.txt
>>> vim crushmap.txt
>>>     here you change tunable choose_total_tries higher... default is
>>> 50. 100 worked for me the first time, and then later I changed it
>>> again to 200.
>>> crushtool -c crushmap.txt -o crushmap.new
>>> ceph osd setcrushmap -i crushmap.new
>>
>> if anything goes wrong with the new crushmap, you can always set the old
>> again:
>>> ceph osd setcrushmap -i crushmap
>>
>> Then you have to wait some time, maybe 30s before you have pgs peering.
>>
>> Now if only there was a log or warning seen in ceph -s that said the
>> tries was exceeded, then this solution would be more obvious (and we
>> would know whether it applies to you)....
>>
>
>
>
> --
> Alejandro Comisario
> CTO | NUBELIU
> E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857
> _
> www.nubeliu.com

-- 
Alejandro Comisario
CTO | NUBELIU
E-mail: alejandro@nubeliu.comCell: +54 9 11 3770 1857
_
www.nubeliu.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com