rebalance near full osd

Andrei Mikhailovsky <andrei@xxxxxxxxxx> · Wed, 6 Apr 2016 04:18:40 +0100 (BST)

Hi 

I've just had a warning ( from ceph -s) that one of the osds is near full. Having investigated the warning, i've located that osd.6 is 86% full. The data distribution is nowhere near to being equal on my osds as you can see from the df command output below:

/dev/sdj1       2.8T  2.4T  413G  86% /var/lib/ceph/osd/ceph-6
/dev/sdb1       2.8T  2.1T  625G  78% /var/lib/ceph/osd/ceph-0
/dev/sdc1       2.8T  2.0T  824G  71% /var/lib/ceph/osd/ceph-1
/dev/sdd1       2.8T  1.5T  1.3T  55% /var/lib/ceph/osd/ceph-2
/dev/sde1       2.8T  1.7T  1.1T  63% /var/lib/ceph/osd/ceph-3
/dev/sdh1       2.8T  1.7T  1.1T  62% /var/lib/ceph/osd/ceph-4
/dev/sdf1       2.8T  1.9T  932G  67% /var/lib/ceph/osd/ceph-8
/dev/sdi1       2.8T  1.9T  880G  69% /var/lib/ceph/osd/ceph-5
/dev/sdg1       2.8T  2.0T  798G  72% /var/lib/ceph/osd/ceph-7

I seem to have the spread of over 30% disk utilisation between the osds, despite all my osds having the identical weight (ceph osd tree output):

 -2 24.56999                     host arh-ibstorage1-ib                                       
  1  2.73000                         osd.1                       up  1.00000          1.00000 
  3  2.73000                         osd.3                       up  1.00000          1.00000 
  5  2.73000                         osd.5                       up  1.00000          1.00000 
  6  2.73000                         osd.6                       up  1.00000          1.00000 
  7  2.73000                         osd.7                       up  1.00000          1.00000 
  8  2.73000                         osd.8                       up  1.00000          1.00000 
  4  2.73000                         osd.4                       up  1.00000          1.00000 
  0  2.73000                         osd.0                       up  1.00000          1.00000 
  2  2.73000                         osd.2                       up  1.00000          1.00000 

What would be the best way to correct the issue without having significant impact on the cluster IO?

Many thanks

Andrei
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com