Unbalanced Cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,

I'm looking for a bit of guidance on a 9 server * 16 OSDs per server= 
144 OSDs system.

This cluster has 143 OSDs in it but ceph osd df shows that they are very 
unbalanced in their utilization.  Some are around 50% full and yet 
others are pushing 85% full.  The balancer was on and that was making it 
worse.  At one point we had an osd at 89.7% nearly triggering a full 
cluster.

We're running cephfs on erasure coded data pool k=7, m=2.  We have two 
options to proceed here and I'm not sure which is appropriate to be done 
next.

Option 1: Add another server -- I'm currently unconvinced that this will 
help much as there is so much skew.  I fear that the cluster will just 
continue to be skewed even if we add another 16 or so OSDs.

Option 2: I've read that the the reason for having skewed OSD usage is 
having too few PGs in the pool.    Currently our data pool is set to 512 
but I think this number is inflated when using erasure coding which is 
why it's set only at 512.  If I recall k=7 is like having 512*7=3584 
PGs.  And is it safe to suddenly double the number of pgs on a cluster 
that is close to the edge of going full?

Here's a chart of our 12 highest usage OSDs plus the one that was the 
worst early on in this process:

https://pasteboard.co/qiaYnDxScEHM.png

Here's a chart showing the distribution of OSD usage:
https://pasteboard.co/NKITkmZYaGiH.png

It is more or less gaussian but the tails seem really long for a cluster 
that's been running the balancer for a long time.

Any guidance would be very much appreciated.

Sincerely

-Dave

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux