Re: Uneven data distribution across OSDs

Michael Hackett <mhackett@xxxxxxxxxx> · Mon, 21 Sep 2015 14:55:19 -0400 (EDT)

Hello Andras,

Some initial observations and questions: 

The total PG recommendation for this cluster would actually be 8192 PGs per the formula. 

Total PG's = (90 * 100) / 2 = 4500 

Next power of 2 = 8192. 

The result should be rounded up to the nearest power of two. Rounding up is optional, but recommended for CRUSH to evenly balance the number of objects among placement groups.

How many data pools are being used for storing objects?

'ceph osd dump |grep pool'

Also how are these 90 OSD's laid out across the 8 hosts and is there any discrepancy between disk sizes and weight?

'ceph osd tree'

Also what are you using for CRUSH tunables and what Ceph release?

'ceph osd crush show-tunables'
'ceph -v'

Thanks,

----- Original Message -----
From: "Andras Pataki" <apataki@xxxxxxxxxxxxxxxxxxxx>
To: ceph-users@xxxxxxxxxxxxxx
Sent: Monday, September 21, 2015 2:00:29 PM
Subject:  Uneven data distribution across OSDs

Hi ceph users, 

I am using CephFS for file storage and I have noticed that the data gets distributed very unevenly across OSDs. 

    * I have about 90 OSDs across 8 hosts, and 4096 PGs for the cephfs_data pool with 2 replicas, which is in line with the total PG recommendation if “Total PGs = (OSDs * 100) / pool_size” from the docs. 
    * CephFS distributes the data pretty much evenly across the PGs as shown by ‘ceph pg dump’ 
    * However – the number of PGs assigned to various OSDs (per weight unit/terabyte) varies quite a lot. The fullest OSD has as many as 44 PGs per terabyte (weight unit), while the emptier ones have as few as 19 or 20. 
    * Even if I consider the total number of PGs for all pools per OSD, the number varies similarly wildly (as with the cephfs_data pool only). 
As a result, when the whole CephFS file system is at 60% full, some of the OSDs already reach the 95% full condition, and no more data can be written to the system. 
Is there any way to force a more even distribution of PGs to OSDs? I am using the default crush map, with two levels (root/host). Can any changes to the crush map help? I would really like to be get higher disk utilization than 60% without 1 of 90 disks filling up so early. 

Thanks, 

Andras 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com