Re: Question: replacing all OSDs of one node in 3node cluster

Ivan Grcic <igrcic@xxxxxxxxx> · Wed, 10 Feb 2016 18:29:13 +0100

Hi Daniel,

oops, wrong copy paste, here are the correct commands:

ceph osd pool get pool-name size

ceph osd pool set pool-name size 2

On Wed, Feb 10, 2016 at 6:27 PM, Ivan Grcic <igrcic@xxxxxxxxx> wrote:
> Grüezi Daniel,
>
> my first question would be: Whats your pool size / min_size?
>
> ceph osd pool get pool-name
>
> It is probably 3 (default size). If you want to have healthy state
> again with only 2 nodes (all the OSDs on node 3 are down), you have to
> set your pool size to 2:
>
> ceph osd pool set pool-name 2
>
> Regards,
> Ivan
>
> On Wed, Feb 10, 2016 at 5:46 PM,  <Daniel.Balsiger@xxxxxxxxxxxx> wrote:
>> Hi Ceph users
>>
>> This is my first post on this mailing list. Hope it's the correct one. Please redirect me to the right place in case it is not.
>> I am running a small (3 nodes with 3 OSD and 1 monitor on each of them) Ceph cluster.
>> Guess what, it is used as Cinder/Glance/Nova RDB  storage for OpenStack.
>>
>> I already replaced some single OSD (faulty disk) without any problems.
>> Now I am facing another problem since the system disk on one of the 3 nodes failed.
>> So I thought to take the 3 OSDs of this node out of the cluster, set up the node from scratch and  add the 3 OSDs again.
>>
>> I did successfully take out the first 2 OSDs.
>> Yes I hit the corner case, I did it with " ceph osd crush reweight osd.<OSD#>0.0", waited for active+clean and followed by "ceph osd out <OSD#>"
>> Status is now:
>>
>> cluster d1af2097-8535-42f2-ba8c-0667f90cab61
>>      health HEALTH_WARN
>>             too many PGs per OSD (329 > max 300)
>>             1 mons down, quorum 0,2 ceph0,ceph2
>>      monmap e1: 3 mons at {ceph0=10.0.0.30:6789/0,ceph1=10.0.0.31:6789/0,ceph2=10.0.0.32:6789/0}
>>             election epoch 482, quorum 0,2 ceph0,ceph2
>>      osdmap e1628: 9 osds: 9 up, 7 in
>>       pgmap v2187375: 768 pgs, 3 pools, 38075 MB data, 9129 objects
>>             119 GB used, 6387 GB / 6506 GB avail
>>                  768 active+clean
>>
>> HEALTH_WARN is because of 1 monitor down (broken node) and too many PGs per OSD (329 > max 300), since I am removing OSDs
>>
>> Now the problem I am facing: When I try to reweight the 3rd OSD to 0 the cluster never comes to the active+clean state anymore.
>> # ceph --version
>> ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>> # ceph osd crush reweight osd.7 0.0
>> reweighted item id 7 name 'osd.7' to 0 in crush map
>> # ceph -s
>> cluster d1af2097-8535-42f2-ba8c-0667f90cab61
>>      health HEALTH_WARN
>>             768 pgs stuck unclean
>>             recovery 817/27387 objects degraded (2.983%)
>>             recovery 9129/27387 objects misplaced (33.333%)
>>             1 mons down, quorum 0,2 ceph0,ceph2
>>      monmap e1: 3 mons at {ceph0=10.0.0.30:6789/0,ceph1=10.0.0.31:6789/0,ceph2=10.0.0.32:6789/0}
>>             election epoch 482, quorum 0,2 ceph0,ceph2
>>      osdmap e1682: 9 osds: 9 up, 7 in; 768 remapped pgs
>>       pgmap v2187702: 768 pgs, 3 pools, 38076 MB data, 9129 objects
>>             119 GB used, 6387 GB / 6506 GB avail
>>             817/27387 objects degraded (2.983%)
>>             9129/27387 objects misplaced (33.333%)
>>                  768 active+remapped
>>
>> I also remarked I need to reweight to 0.7 to get in active+clean state again.
>>
>> Any idea how to remove this last OSD that I can setup the node again ?
>> Thank you in advance, any help appreciated.
>>
>> Daniel
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com