Re: Question: replacing all OSDs of one node in 3node cluster

<Daniel.Balsiger@xxxxxxxxxxxx> · Thu, 11 Feb 2016 13:04:53 +0000

Hi Mihai, Grüezi Ivan :)

Thank both of you for the fast reply. Its appreciated.
When I bootstrapped the cluster I used  
--
osd_pool_default_size = 3
osd_pool_default_min_size = 2
--
in ceph.conf. This is also set for each pool  at the moment.

I understood from docs this means each object is stored 3 times. 
Also in the docs for min_size:
-
Sets the minimum number of written replicas for objects in the pool in order to acknowledge a write operation to the client. If minimum is not met, Ceph will not acknowledge the write to the client. This setting ensures a minimum number of replicas when operating in degraded mode.
-
Since I still wanted to have I/O possible from glance/nova/cinder when I had to take down one of the nodes, because of kernel updates or H/W changes, I took those settings. And this use-case worked well.
So I wanted to force the cluster in the state , where one node fails completely, by taking all its OSDs out, while still having active+clean state. Means holding still 3 copies but only with two nodes. Seems this is not possible. My bad.

If I got you correctly, one cannot achieve an active+clean state at all with a pool size of 3 and only two nodes up, regardless of the number of OSDs used per node, right ?

When setting pool size to 2 everything went fine.
I have some infrastructure mgmt system in place which installed the new the node quickly, it adds also ceph keys and confs and does not touch any disks used for OSDs.
Everything works now as expected. All nodes fully functional again. Thank you very much for the help.

Best, Daniel

---

From: Mihai Gheorghe [mailto:mcapsali@xxxxxxxxx] 
Sent: Mittwoch, 10. Februar 2016 18:51
To: Ivan Grcic <igrcic@xxxxxxxxx>
Cc: Balsiger Daniel, INI-INO-ECO-MXT <Daniel.Balsiger@xxxxxxxxxxxx>; ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject: Re:  Question: replacing all OSDs of one node in 3node cluster

As far as i know you can do it in two ways: (assuming you hace a pool size of 3 on all 3 nodes with min_size 2 to still have access to data)

1. Set noout for not starting the rebalance of the cluster. Reinstall OS on the faulty node and redeploy the node with all keys and conf files (either manual or with ceph-deploy) without rezapping the OSD's. Activate the OSD's and unset the noout. This should be enogh, because the OSD have the same cluster UUID.

2. Set pool size to 2 and remove the OSD's (before setting an OSD out, first reweigth it to 0, then set it out than remove from cluster)from the faulty node from crush map. (wait for rebalance if any). Reinstall OS on the node and redeploy ceph on that node creating new OSDs in that node (they should get the same id as before).

I have successfully done the first one a while back and it worked, although i don't remember the exat commands i did.

2016-02-10 19:29 GMT+02:00 Ivan Grcic <igrcic@xxxxxxxxx>:
Hi Daniel,

oops, wrong copy paste, here are the correct commands:

ceph osd pool get pool-name size

ceph osd pool set pool-name size 2

On Wed, Feb 10, 2016 at 6:27 PM, Ivan Grcic <igrcic@xxxxxxxxx> wrote:
> Grüezi Daniel,
>
> my first question would be: Whats your pool size / min_size?
>
> ceph osd pool get pool-name
>
> It is probably 3 (default size). If you want to have healthy state
> again with only 2 nodes (all the OSDs on node 3 are down), you have to
> set your pool size to 2:
>
> ceph osd pool set pool-name 2
>
> Regards,
> Ivan
>
> On Wed, Feb 10, 2016 at 5:46 PM,  <Daniel.Balsiger@xxxxxxxxxxxx> wrote:
>> Hi Ceph users
>>
>> This is my first post on this mailing list. Hope it's the correct one. Please redirect me to the right place in case it is not.
>> I am running a small (3 nodes with 3 OSD and 1 monitor on each of them) Ceph cluster.
>> Guess what, it is used as Cinder/Glance/Nova RDB  storage for OpenStack.
>>
>> I already replaced some single OSD (faulty disk) without any problems.
>> Now I am facing another problem since the system disk on one of the 3 nodes failed.
>> So I thought to take the 3 OSDs of this node out of the cluster, set up the node from scratch and  add the 3 OSDs again.
>>
>> I did successfully take out the first 2 OSDs.
>> Yes I hit the corner case, I did it with " ceph osd crush reweight osd.<OSD#>0.0", waited for active+clean and followed by "ceph osd out <OSD#>"
>> Status is now:
>>
>> cluster d1af2097-8535-42f2-ba8c-0667f90cab61
>>      health HEALTH_WARN
>>             too many PGs per OSD (329 > max 300)
>>             1 mons down, quorum 0,2 ceph0,ceph2
>>      monmap e1: 3 mons at {ceph0=10.0.0.30:6789/0,ceph1=10.0.0.31:6789/0,ceph2=10.0.0.32:6789/0}
>>             election epoch 482, quorum 0,2 ceph0,ceph2
>>      osdmap e1628: 9 osds: 9 up, 7 in
>>       pgmap v2187375: 768 pgs, 3 pools, 38075 MB data, 9129 objects
>>             119 GB used, 6387 GB / 6506 GB avail
>>                  768 active+clean
>>
>> HEALTH_WARN is because of 1 monitor down (broken node) and too many PGs per OSD (329 > max 300), since I am removing OSDs
>>
>> Now the problem I am facing: When I try to reweight the 3rd OSD to 0 the cluster never comes to the active+clean state anymore.
>> # ceph --version
>> ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>> # ceph osd crush reweight osd.7 0.0
>> reweighted item id 7 name 'osd.7' to 0 in crush map
>> # ceph -s
>> cluster d1af2097-8535-42f2-ba8c-0667f90cab61
>>      health HEALTH_WARN
>>             768 pgs stuck unclean
>>             recovery 817/27387 objects degraded (2.983%)
>>             recovery 9129/27387 objects misplaced (33.333%)
>>             1 mons down, quorum 0,2 ceph0,ceph2
>>      monmap e1: 3 mons at {ceph0=10.0.0.30:6789/0,ceph1=10.0.0.31:6789/0,ceph2=10.0.0.32:6789/0}
>>             election epoch 482, quorum 0,2 ceph0,ceph2
>>      osdmap e1682: 9 osds: 9 up, 7 in; 768 remapped pgs
>>       pgmap v2187702: 768 pgs, 3 pools, 38076 MB data, 9129 objects
>>             119 GB used, 6387 GB / 6506 GB avail
>>             817/27387 objects degraded (2.983%)
>>             9129/27387 objects misplaced (33.333%)
>>                  768 active+remapped
>>
>> I also remarked I need to reweight to 0.7 to get in active+clean state again.
>>
>> Any idea how to remove this last OSD that I can setup the node again ?
>> Thank you in advance, any help appreciated.
>>
>> Daniel
>>
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com