Re: Too many objects per pg than average: deadlock situation

Mike A <mike.almateia@xxxxxxxxx> · Mon, 21 May 2018 00:58:15 +0300

Hello.

Reply inline

> 21 мая 2018 г., в 0:42, Anthony D'Atri <aad@xxxxxxxxxxxxxx> написал(а):
> 
> The recent retcon of the recommended PG:OSD ratio to 100 is mostly about RAM usage, as more people deploy / grow to larger clusters.  200 is the new hard limit, it used to be the recommendation.
> 
> Not familiar with gnocchi in this context, but I can’t imagine a serious problem happening if you were to split the PG’s in the gnocchi pool, after bumping  mon pg warn max per osd accordingly.

If I split PG on the gnocchi pool - we increase a quantity PG per OSD in cluster and can reach limit PG per OSD.

> 
> I don’t understand 1. below — why would the data size matter?

According to the calculator on ceph.com/pgcalc - the PG are distributed between pools, depending on their predicted size, and not the number of objects in it.
Therefore, the situation of many, many objects in a small pool - does not solve at all. 

> 
> 
> 
>> 
>> Hello!
>> 
>> In our cluster, we see a deadlock situation.
>> This is a standard cluster for an OpenStack without a RadosGW, we have a standard block access pools and one for metrics from a gnocchi.
>> The amount of data in the gnocchi pool is small, but objects are just a lot.
>> 
>> When planning a distribution of PG between pools, the PG are distributed depending on the estimated data size of each pool. Correspondingly, as suggested by pgcalc for the gnocchi pool, it is necessary to allocate a little PG quantity.
>> 
>> As a result, the cluster is constantly hanging with the error "1 pools have many more objects per pg than average" and this is understandable: the gnocchi produces a lot of small objects and in comparison with the rest of pools it is tens times larger.
>> 
>> And here we are at a deadlock:
>> 1. We can not increase the amount of PG on the gnocchi pool, since it is very small in data size
>> 2. Even if we increase the number of PG - we can cross the recommended 200 PGs limit for each OSD in cluster
>> 3. Constantly holding the cluster in the HEALTH_WARN mode is a bad idea
>> 4. We can set the parameter "mon pg warn max object skew", but we do not know how the Ceph will work when there is one pool with a huge object / pool ratio
>> 
>> There is no obvious solution.
>> 
>> How to solve this problem correctly?
>> — 
>> Mike, runs!--
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html