Re: Ceph extension - how to equilibrate ?

Maxime Guyot <Maxime.Guyot@xxxxxxxxx> · Wed, 19 Apr 2017 15:11:46 +0000

Hi Pascal,

I ran into the same situation some time ago: a small cluster and adding a node with HDDs double the size of the existing ones and wrote about it here:
http://ceph.com/planet/the-schrodinger-ceph-cluster/

When adding OSDs to a cluster rebalancing/data movement is unavoidable in most cases. Since you will be going from a 144TB cluster to a 240 TB cluster, you can estimate that +66% of your
 data will be rebalanced/moved.

Peter already covered how to move the HDDs from one server to another (incl. journal). I just want to point out that you can do the “ceph osd crush set" before you do the physical move
 of the drives. This lets you rebalance on your own terms (schedule, rollback etc…).

The easy way:
-         
Create the new OSDs (8TB) with weight 0
-         
Move each OSDs to its desired location and weights: “ceph osd crush set osd.X <desired-weight> root=<root> host=<desired-host>”
-         
Monitor and wait for the rebalance to be done (a few days or weeks depending on performance)
-         
Set noout && physically move the drives && unset noout

In production, you want to consider the op priority and the granularity of the increase (increasing weights progressively etc…).

Cheers,
Maxime

From: 
ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx>

Date: Tuesday 18 April 2017 20:26

To: "pascal.pucci@xxxxxxxxxxxxxxx" <pascal.pucci@xxxxxxxxxxxxxxx>, "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>

Subject: Re: [ceph-users] Ceph extension - how to equilibrate ?

On 04/18/17 16:31, 
pascal.pucci@xxxxxxxxxxxxxxx wrote:

Hello,
Just an advise : next time, I will extend my Jewel ceph cluster with a fourth node.
Actually, we have 3 x nodes of 12 x OSD with 4TB DD (36 x DD 4TB).
I will add a new node with 12 x 8TB DD (will add 12 new OSD => 48 OSD).

I hope those aren't SMR disks... make sure they're not or it will be very slow, to the point where
osds will time out and die.

So, how to simply equilibrate ?
How to just unplug 3 x DD 4TB per node and add to fourth node  and just plug 3 x 8TB in each node coming from fourth node ?

I think you only have to stop them (hopefully not enough to cause missing objects, and optionally set noout first), unmount them, move the disks, mount them and start them on the new node. Then change the
 crush rule:

ceph osd crush move osd.X host=nodeY

If your journals aren't being moved too, then flush the journals after the osds are stopped:

sync

ceph-osd --id $n --setuser ceph --setgroup ceph --flush-journal

(if that crashes, start the osd, then stop again, and retry)

and before starting them, make new journals.

ceph-osd --id $n --setuser ceph --setgroup ceph --mkjournal

I want at the end : 3 x DD 8TB per node and 9 x DD 4TB per node ?
How to do that in the easyest way ? 
I don't want move all data : It will take a long time per OSD...

I don't know how much data this will move if any... but if it moves data, you probably don't have a choice.

Is there a way to just switch OSD between node ?
Thanks for your help.
Pascal,

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com