We regretably have to increase PG's in a ceph cluster this way more often than anyone should ever need to. As such, we have scripted it out. A basic version of the script that
should work for you is below.
First, create a function to check for any pg states that you don't want to continue if any pgs are in them (better than duplicating code). Second, set the flags so your cluster doesn't die while you do this. Third, set your numbers of current PGs and the destination PGs for the for loop. The Loop will ignore any number not divisible by 256. As you've found, increasing by 256 is a good number. More than that and you'll run into issues of your cluster curling into a fetal position and crying. This will loop through increasing your pg_num, wait until everything is settled, then increase your pgp_num. The seemingly excessive sleeps are to help the cluster be able to resolve blocked requests that will still happen during this. Lastly unset the flags to let the cluster start moving the data around. One thing to note, in a cluster with 800-1000 HDD OSDS with SSD journals, going from 16k to 32k PGs, We set maxbackfills to 1 during busy times and 2 during idle times. maxbackfills of more than 2 is not beneficial for us to increasing our pg count. We have tested maxbackfills of 2 and 5, both took the entire weekend to add 4k PGs. We also do not add all of the PGs at once. We do 4k each weekend and 2k during the week waiting for the cluster to finish each time to give our mon stores a chance to compact before we continue. check_health(){ #If this finds any of the strings in the grep, then it will return 0, otherwise it will return 1 (or whatever the grep return code is) ceph health | grep 'peering\|stale\|activating\|creating\|down' > /dev/null return $? } for flag in nobackfill norecover noout nodown do ceph osd set $flag done #Set your current and destination pg counts here. for num in {2048..16384} do [ $(( $i % 256 )) -eq 0 ] || continue while sleep 10 do check_health if [ $? -ne 0 ] then #This assumes your pool is named rbd ceph osd pool set rbd pg_num $num break fi done sleep 60 while sleep 10 do check_health if [ $? -ne 0 ] then #This assumes your pool is named rbd ceph osd pool set rbd pgp_num $num break fi done sleep 60 done for flag in nobackfill norecover noout nodown do ceph osd unset $flag done
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Matteo Dacrema [mdacrema@xxxxxxxx]
Sent: Monday, September 19, 2016 2:51 AM To: Will.Boege; ceph-users@xxxxxxxxxxxxxx Subject: Re: [ceph-users] [EXTERNAL] Re: Increase PG number Hi,
I’ve 3 different cluster.
The first I’ve been able to upgrade from 1024 to 2048 pgs with 10 minutes of "io freeze”.
The second I’ve been able to upgrade from 368 to 512 in a sec without any performance issue, but from 512 to 1024 it take over 20 minutes to create pgs.
The third I’ve to upgrade is now 2048 pgs and I’ve to take it to 16384. So what I’m wondering is how to do it with minimum performance impact.
Maybe the best way is to upgrade by 256 to 256 pg and pgp num each time letting the cluster to rebalance every time.
Thanks
Matteo
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in
error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately
by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information
is strictly prohibited.
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com