Re: 3x replicated rbd pool ssd data spread across 4 osd's

Jack <ceph@xxxxxxxxxxxxxx> · Sun, 2 Sep 2018 15:53:00 +0200



Well, you have more than one pool here

pg_num = 8, size = 3 -> 24 pgs
The extra 48 pgs comes from somewhere else

About the pg's distribution, check out the balancer module


tldr: that distribution is computed based on an algorithm, it is thus
predictable (that is the point) but the perfect size-wise (there is no
"central point" that coult take everything into account)
The balancer module will do that: move pg around to get the best repartition

On 09/02/2018 03:14 PM, Marc Roos wrote:
> 
> So that changes the question to: why is ceph not distributing the pg's 
> evenly across four osd's? 
> 
> [@c01 ~]# ceph osd df |egrep '^19|^20|^21|^30'
> 19   ssd 0.48000  1.00000  447G   133G   313G 29.81 0.70  16
> 20   ssd 0.48000  1.00000  447G   158G   288G 35.40 0.83  19
> 21   ssd 0.48000  1.00000  447G   208G   238G 46.67 1.10  20
> 30   ssd 0.48000  1.00000  447G   149G   297G 33.50 0.79  17
> 
> rbd.ssd: pg_num 8 pgp_num 8 
> 
> I will look into the balancer, but I am still curious why these 8 pg 
> (8x8=64? + 8? = 72) are still not spread evenly. Why not 18 on every 
> osd?
> 
> -----Original Message-----
> From: Jack [mailto:ceph@xxxxxxxxxxxxxx] 
> Sent: zondag 2 september 2018 14:06
> To: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  3x replicated rbd pool ssd data spread across 
> 4 osd's
> 
> ceph osd df will get you more information: variation & pg number for 
> each OSD
> 
> Ceph does not spread object on a per-object basis, but on a pg-basis
> 
> The data repartition is thus not perfect You may increase your pg_num, 
> and/or use the mgr balancer module
> (http://docs.ceph.com/docs/mimic/mgr/balancer/)
> 
> 
> On 09/02/2018 01:28 PM, Marc Roos wrote:
>>
>> If I have only one rbd ssd pool, 3 replicated, and 4 ssd osd's. Why 
>> are these objects so unevenly spread across the four osd's? Should 
>> they all not have 162G?
>>
>>
>> [@c01 ]# ceph osd status 2>&1
>> +----+------+-------+-------+--------+---------+--------+---------+---
>> +----+------+-------+-------+--------+---------+--------+---------+--
>> ------+
>> | id | host |  used | avail | wr ops | wr data | rd ops | rd data |   
>> state   |
>> +----+------+-------+-------+--------+---------+--------+---------+---
>> +----+------+-------+-------+--------+---------+--------+---------+--
>> ------+
>> | 19 | c01  |  133G |  313G |    0   |     0   |    0   |     0   | 
>> exists,up |
>> | 20 | c02  |  158G |  288G |    0   |     0   |    0   |     0   | 
>> exists,up |
>> | 21 | c03  |  208G |  238G |    0   |     0   |    0   |     0   | 
>> exists,up |
>> | 30 | c04  |  149G |  297G |    0   |     0   |    0   |     0   | 
>> exists,up |
>> +----+------+-------+-------+--------+---------+--------+---------+---
>> +----+------+-------+-------+--------+---------+--------+---------+--
>> ------+
>>
>> All objects in the rbd pool are 4MB not? Should be easy to spread them 
> 
>> evenly, what am I missing here?
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com