Throttle pool pg_num/pgp_num increase impact

daniel.vanderster@xxxxxxx (Dan Van Der Ster) · Tue, 8 Jul 2014 17:14:55 +0000

Hi Greg,
We're also due for a similar splitting exercise in the not too distant future, and will also need to minimize the impact on latency.

In addition to increasing pg_num in small steps and using a minimal max_backfills/recoveries configuration, I was planning to increase pgp_num very slowly as well. In fact, I don't mind if the whole splitting exercise takes weeks to complete. Do you think that'd work, or are intermediate values for pgp_num somehow counterproductive?

Cheers, Dan

On Jul 8, 2014 7:01 PM, Gregory Farnum <greg at inktank.com> wrote:
The impact won't be 300 times bigger, but it will be bigger. There are two things impacting your cluster here
1) the initial "split" of the affected PGs into multiple child PGs. You can mitigate this by stepping through pg_num at small multiples.
2) the movement of data to its new location (when you adjust pgp_num). This can be adjusted by setting the "OSD max backfills" and related parameters; check the docs.
-Greg

On Tuesday, July 8, 2014, Kostis Fardelas <dante1234 at gmail.com<mailto:dante1234 at gmail.com>> wrote:
Hi,
we maintain a cluster with 126 OSDs, replication 3 and appr. 148T raw
used space. We store data objects basically on two pools, the one
being appr. 300x larger in data stored and # of objects terms than the
other. Based on the formula provided here
http://ceph.com/docs/master/rados/operations/placement-groups/ we
computed that we need to increase our per pool pg_num & pgp_num to
appr 6300 PGs / pool (100 * 126 / 2).
We started by increasing the pg & pgp number on the smaller pool from
1800 to 2048 PGs (first the pg_num, then the pgp_num) and we
experienced a 10X increase in Ceph total operations and an appr 3X
disk latency increase in some underlying OSD disks. At the same time,
for appr 10 seconds we experienced very low values of client io and
op/s

Should we be worried that the pg/pgp num increase on the bigger pool
will have a 300X larger impact?
Can we throttle this impact by injecting any thresholds or applying an
appropriate configuration on our ceph conf?

Regards,
Kostis
_______________________________________________
ceph-users mailing list
ceph-users at lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Software Engineer #42 @ http://inktank.com | http://ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140708/611ae0b8/attachment.htm>