Uneven data distribution across OSDs

Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxx> · Mon, 21 Sep 2015 18:00:29 +0000

Hi ceph users,

I am using CephFS for file storage and I have noticed that the data gets distributed very unevenly across OSDs.

I have about 90 OSDs across 8 hosts, and 4096 PGs for the cephfs_data pool with 2 replicas, which is in line with the total PG recommendation if “Total PGs = (OSDs * 100) / pool_size” from the docs.
CephFS distributes the data pretty much evenly across the PGs as shown by ‘ceph pg dump’
However – the number of PGs assigned to various OSDs (per weight unit/terabyte) varies quite a lot.  The fullest OSD has as many as 44 PGs per terabyte (weight unit), while the emptier ones have as few as 19 or 20.
Even if I consider the total number of PGs for all pools per OSD, the number varies similarly wildly (as with the cephfs_data pool only).
As a result, when the whole CephFS file system is at 60% full, some of the OSDs already reach the 95% full condition, and no more data can be written to the system.
Is there any way to force a more even distribution of PGs to OSDs?  I am using the default crush map, with two levels (root/host).  Can any changes to the crush map help?  I would really like to be get higher disk utilization than 60% without 1 of 90 disks
 filling up so early.

Thanks,

Andras

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com