Throttle pool pg_num/pgp_num increase impact

dante1234@xxxxxxxxx (Kostis Fardelas) · Wed, 9 Jul 2014 09:15:49 +0300

Hi Greg,
thanks for your immediate feedback. My comments follow.

Initially we thought that the 248 PG (15%) increment we used was
really small, but it seems that we should increase PGs in even small
increments. I think that the term "multiples" is not the appropriate
term here, I fear someone would assume that it is the same (or even
the right way to do) to go from 10 PGs to 20 PGs and from 1000 PGs to
2000 PGs just because he/she uses a small 2X multiple.

Regarding, the data movement due to pgp_num increase, we had already
set osd_max_backfills, osd_recovery_max_active,
osd_recovery_op_priority, osd_recovery_threads to their minimum values
but we still got impacted. The first two are also set in ceph.conf but
we use to change all four of them at runtime (through injecting). Is
there anything else we should check? Is it some known issue?

Another question that came up from our exercise is related to pool
isolation during PG remapping. As I reported we only changed the
pg/pgp num in one of our pools but ceph client io and ceph ops seem to
have dropped at cluster level (verified by looking at ceph status).
Did our second pool got impacted too or we should take from granted
that the pools are indeed isolated during remapping and there is a
ceph status view granularity issue here?

Regards,
Kostis

On 8 July 2014 20:01, Gregory Farnum <greg at inktank.com> wrote:
> The impact won't be 300 times bigger, but it will be bigger. There are two
> things impacting your cluster here
> 1) the initial "split" of the affected PGs into multiple child PGs. You can
> mitigate this by stepping through pg_num at small multiples.
> 2) the movement of data to its new location (when you adjust pgp_num). This
> can be adjusted by setting the "OSD max backfills" and related parameters;
> check the docs.
> -Greg
>
>
> On Tuesday, July 8, 2014, Kostis Fardelas <dante1234 at gmail.com> wrote:
>>
>> Hi,
>> we maintain a cluster with 126 OSDs, replication 3 and appr. 148T raw
>> used space. We store data objects basically on two pools, the one
>> being appr. 300x larger in data stored and # of objects terms than the
>> other. Based on the formula provided here
>> http://ceph.com/docs/master/rados/operations/placement-groups/ we
>> computed that we need to increase our per pool pg_num & pgp_num to
>> appr 6300 PGs / pool (100 * 126 / 2).
>> We started by increasing the pg & pgp number on the smaller pool from
>> 1800 to 2048 PGs (first the pg_num, then the pgp_num) and we
>> experienced a 10X increase in Ceph total operations and an appr 3X
>> disk latency increase in some underlying OSD disks. At the same time,
>> for appr 10 seconds we experienced very low values of client io and
>> op/s
>>
>> Should we be worried that the pg/pgp num increase on the bigger pool
>> will have a 300X larger impact?
>> Can we throttle this impact by injecting any thresholds or applying an
>> appropriate configuration on our ceph conf?
>>
>> Regards,
>> Kostis
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users at lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com