Re: PG explosion with erasure codes, power of two and "x pools have many more objects per pg than average"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Answes inline.

2018-05-25 17:57 GMT+02:00 Jesus Cea <jcea@xxxxxxx>:
Hi there.

I have configured a POOL with a 8+2 erasure code. My target by space
usage and OSD configuration, would be 128 PG, but since each configure
PG will be using 10 actual "PGs", I have created the pool with only 8 PG
(80 real PG). Since I can increase PGs but not decreasing it, this
decision seems sensible.

Some questions:

1. Documentation insists everywhere that the PG could should be a power
of two. Would be nice to know the consequences of not following this
recommendation. Would be nice to know too if being "close" to a power of
two is better than be far away and if it is better to be close but below
or close but a little bit more. If ideal value is 128 but I only can be
120 or 130, what should I choose?. 120 or 130?. Why?

Go for the next larger power of two under the assumption that your cluster will grow.
 

2. As I understand, the PG count that should be "power of two" is "8",
in this case (real 80 PG underneath). Good. In this case, the next step
would be 16 (160 real PG). I would rather prefer to increase it to 12 or
13 (120/130 real PGs). Would it be reasonable?. What are the
consequences of increasing PG to 12 or 13 instead of choosing 16 (the
next power of two).

Data will be poorly balanced between PGs if it's not a power of two.
 

3. Is there any negative effect for CRUSH of using erasure code 8+2
instead of 6+2 or 14+2 (power of two)?. I have 25 OSDs, so requiring 16
for a single operation seems a bad idea, even more when my OSD
capacities are very spread (from 150 GB to 1TB) and filling a small OSD
would block writes in the entire pool.

EC rules don't have to be powers of two. And yes, too many chunks for
EC pools is a bad idea. It's rarely advisable to have a total of k + m larger
than 8 or so.

Also, you should have at least k + m + 1 servers, otherwise full server
failures cannot be handled properly.

A large spread between the OSD capacities within one crush rule is also
usually a bad idea, 150 GB to 1 TB is typically too big.
 

4. Since I have created a erasure coded pool with 8 PG, I am getting
warnings of "x pools have many more objects per pg than average". The
data I am copying is coming from a legacy pool with PG=512. New pool PG
is 8. That is creating ~30.000 objects per PG, far above average (616
objects). What can I do?. Moving to 16 or 32 PGs is not going to improve
the situation, but will consume PGs (32*10). Advice?.

Well, you reduced the number of PGs by a factor of 64, so you'll of course
see a large skew here. The option mon_pg_warn_max_object_skew
controls when this warning is shown, default is 10.
 

5. I understand the advice of having <300 PGs per OSD because memory
usage, but I am wondering about the impact of the number of objects in
each PG. I wonder if memory and resource wise, having 100 PG with 10.000
objects each is far more demanding than 1000 PGs with 50 objects each.
Since I have PGs with 300 objects and PGs with 30.000 objects, I wonder
about the memory impact of each. What is the actual memory hungry factor
in a OSD, PGs or objects per PG?.

PGs typically impose a bigger overhead. But PGs with a large number of objects
can become annoying...


Paul
 

Thanks for your time and knowledge :).

--
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea@xxxxxxx - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea@xxxxxxxxxx  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux