Re: estimate the impact of changing pg_num

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I don't know the general calculation, but last week we split a pool with 20 million tiny objects from 512 to 1024 pgs, on a cluster with 80 OSDs. IIRC around 7 million objects needed to move, and it took around 13 hours to finish. The bottleneck in our case was objects per second (limited to around 1000/s), not network throughput (which never exceeded ~50MB/s).

It wasn't completely transparent... the time to write a 4kB object increased from 5ms to around 30ms during this splitting process.

I would guess that if you split from 1k to 8k pgs, around 80% of your data will move. Basically, 7 out of 8 objects will be moved to a new primary PG, but any objects that end up with 2nd or 3rd copies on the first 1k PGs should not need to be moved.

I'd also be interested to hear of similar splitting experiences. We've been planning a similar intervention on our larger cluster to move from 4k PGs to 16k. I have been considering making the change gradually (10-100 PGs at a time) instead of all at once. This approach would certainly lower the performance impact, but would take much much longer to complete. I wrote a short script to perform this gentle splitting here:  https://github.com/cernceph/ceph-scripts/blob/master/tools/split/ceph-gentle-split

Be sure to understand what it's doing before trying it.

Cheers,
Dan

On 1 Feb 2015 18:21, "Xu (Simon) Chen" <xchenum@xxxxxxxxx> wrote:
Hi folks,

I was running a ceph cluster with 33 OSDs. More recently, 33x6 new OSDs hosted on 33 new servers were added, and I have finished balancing the data and then marked the 33 old OSDs out.

As I have 6x as many OSDs, I am thinking of increasing pg_num of my largest pool from 1k to at least 8k. What worries me is that this cluster has around 10M objects and is supporting many production VMs with RBD.

I am wondering if there is a good way to estimate the amount of data that will be shuffled after I increase the PG_NUM. I want to make sure this can be done within a reasonable amount of time, such that I can declare a proper maintenance window (either over night, or throughout a weekend..)

Thanks!
-Simon

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux