Re: Could you please explain the PG concept

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Absolutely.

Moreover, PGs are not a unit of size, they are a logical grouping of smaller RADOS objects, because a few thousand PGs are a lot easier and less expensive to manage than tens or hundreds of millions of small underlying RADOS objects.  They’re for efficiency, and are not any set size in bytes.

With respect to PG calculators, conventional wisdom is for the number of PGs in a given pool to be a power of 2:  1024, 2048, 4096, etc.  The reasons for this aren’t as impactful as they were with previous releases, but it still has benefits and is good practice.  That’s one reason why a pool forecast for 15% and one for 18% may recommend the same number of PGs, because the usual practice is to round up to the nearest power of 2.  So if calculations suggest say 903 PGs for 15% and 1138 for 18%, both will round up to the same 1024.

If you only have one pool in your cluster, which usually means you’re only using RBD, then the calculations are very simple.  When you have multiple pools, it becomes more complicated because you’re solving for the number of PG replicas that end up on each OSD, which involves the (potentially different) replication factor of each pool, the relative expected capacities of the pools, the use of each pool (an RGW pool and an RBD pool experience different workloads), etc.

> On Apr 25, 2023, at 7:45 PM, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> Hi Wodel,
> 
> The simple explanation is that PGs are a level of storage abstraction above
> the drives (OSD) and below objects (pools).  The links below may be
> helpful.  PGs consume resources, so they should be planned as best you
> can.  Now you can scale them up and down, and use autoscaler, so you don't
> have to be spot on right away.  PGs peer up and replicate data according to
> your chosen CRUSH rules.
> 
> https://ceph.io/en/news/blog/2014/how-data-is-stored-in-ceph-cluster/
> 
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/1.3/html/storage_strategies_guide/placement_groups_pgs
> 
> https://www.sebastien-han.fr/blog/2012/10/15/ceph-data-placement/
> --
> Alex Gorbachev
> ISS Storcium
> 
> 
> 
> On Tue, Apr 25, 2023 at 6:10 PM wodel youchi <wodel.youchi@xxxxxxxxx> wrote:
> 
>> Hi,
>> 
>> I am learning Ceph and I am having a hard time understanding PG and PG
>> calculus .
>> 
>> I know that a PG is a collection of objects, and that PG are replicated
>> over the hosts to respect the replication size, but...
>> 
>> In traditional storage, we use size in Gb, Tb and so on, we create a pool
>> from a bunch of disks or raid arrays of some size then we create volumes of
>> a certain size and use them. If the storage is full we add disks, then we
>> extend our pools/volumes.
>> The idea of size is simple to understand.
>> 
>> Ceph, although it supports the notion of pool size in Gb, Tb ...etc. Pools
>> are created using PGs, and now there is also the notion of % of data.
>> 
>> When I use pg calc from ceph or from redhat, the generated yml file
>> contains the % variable, but the commands file contains only the PGs, and
>> when you are configuring 15% and 18% have the same number of PGs
>> !!!!!!!!!!!!???
>> 
>> The pg calc encourages you to create a %data multiple of 100, in other
>> words, it assumes that you know all your pools from the start. What if you
>> won't consume all your raw disk space.
>> What happens when you need to add a new pool?
>> 
>> Also when you create several pools, and then execute ceph osd df tree, you
>> can see that all pools show the raw size as a free space, it is like all
>> pools share the same raw space regardless of their PG number.
>> 
>> If someone can put some light on this concept and how to manage it wisely,
>> because the documentation keeps saying that it's an important concept, that
>> you have to pay attention when choosing the number of PGs for a pool from
>> the start.
>> 
>> Regards.
>> 
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
>>> 
>> Virus-free.www.avast.com
>> <
>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
>>> 
>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux