Re: scalability new node to the existing cluster

Hans van den Bogert <hansbogert@xxxxxxxxx> · Wed, 18 Apr 2018 16:50:10 +0200

I keep seeing these threads where adding nodes has such an impact on the cluster as a whole, that I wonder what the rest of the cluster looks like. Normally I’d just advise someone to put a limit on the concurrent backfills that can be done, and `osd max backfills` by default already is 1. Could it be that the real culprit here is that the hardware is heavily overbooked? 68 OSDs per node sounds an order of magnitude above what you should be doing, unless you have vast experience with Ceph and its memory requirements under stress. 
I wonder if this cluster would even come online after an outage, or would also crumble due to peering and possible backfilling.

To be honest I don’t even get why using the weight option would solve this. The same amount of data needs to be transferred any way at one point; it seems like a poor-man’s throttling mechanism. And if memory shortage is the case here, due to, again, the many OSDs than the reweight strategy will only give you slightly better odds.

So
1) I would keep track of memory usage on the nodes to see if that increases under peering/backfilling, 
  - If this is the case, and you’re using bluestore: try lowering bluestore_cache_size* params, to give you some leeway.
2) If using bluestore, try throttling by changing the following params, depending on your environment:
  - osd recovery sleep
  - osd recovery sleep hdd
  - osd recovery sleep ssd

There are other throttling params you can change, though most defaults are just fine in my environment, and I don’t have experience with them.

Good luck, 

Hans

> On Apr 18, 2018, at 1:32 PM, Serkan Çoban <cobanserkan@xxxxxxxxx> wrote:
> 
> You can add new OSDs with 0 weight and edit below script to increase
> the osd weights instead of decreasing.
> 
> https://github.com/cernceph/ceph-scripts/blob/master/tools/ceph-gentle-reweight
> 
> 
> On Wed, Apr 18, 2018 at 2:16 PM, nokia ceph <nokiacephusers@xxxxxxxxx> wrote:
>> Hi All,
>> 
>> We are having 5 node cluster with EC 4+1 . Each node has 68 HDD . Now we are
>> trying to add new node with 68 disks to the cluster .
>> 
>> We tried to add new node and created all OSDs in one go , the cluster
>> stopped all client traffic and does only backfilling .
>> 
>> Any procedure to add the new node without affecting the client traffic ?
>> 
>> If we create  OSDs one by one , then there is no issue in client traffic
>> however  time taken to add new node with 68 disks will be several months.
>> 
>> Please provide your suggestions..
>> 
>> Thanks,
>> Muthu
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com