Hello, On Wed, 6 Apr 2016 04:18:40 +0100 (BST) Andrei Mikhailovsky wrote: > Hi > > I've just had a warning ( from ceph -s) that one of the osds is near > full. Having investigated the warning, i've located that osd.6 is 86% > full. The data distribution is nowhere near to being equal on my osds as > you can see from the df command output below: > Firstly, read the very recent thread: "dealing with the full osd / help reweight" from this ML. You really want monitoring SW to keep track of disk utilization if you're not doing it manually. > /dev/sdj1 2.8T 2.4T 413G 86% /var/lib/ceph/osd/ceph-6 > /dev/sdb1 2.8T 2.1T 625G 78% /var/lib/ceph/osd/ceph-0 > /dev/sdc1 2.8T 2.0T 824G 71% /var/lib/ceph/osd/ceph-1 > /dev/sdd1 2.8T 1.5T 1.3T 55% /var/lib/ceph/osd/ceph-2 > /dev/sde1 2.8T 1.7T 1.1T 63% /var/lib/ceph/osd/ceph-3 > /dev/sdh1 2.8T 1.7T 1.1T 62% /var/lib/ceph/osd/ceph-4 > /dev/sdf1 2.8T 1.9T 932G 67% /var/lib/ceph/osd/ceph-8 > /dev/sdi1 2.8T 1.9T 880G 69% /var/lib/ceph/osd/ceph-5 > /dev/sdg1 2.8T 2.0T 798G 72% /var/lib/ceph/osd/ceph-7 > > I seem to have the spread of over 30% disk utilisation between the osds, > despite all my osds having the identical weight (ceph osd tree output): > > > -2 24.56999 host arh-ibstorage1-ib > 1 2.73000 osd.1 up 1.00000 1.00000 > 3 2.73000 osd.3 up 1.00000 1.00000 > 5 2.73000 osd.5 up 1.00000 1.00000 > 6 2.73000 osd.6 up 1.00000 1.00000 > 7 2.73000 osd.7 up 1.00000 1.00000 > 8 2.73000 osd.8 up 1.00000 1.00000 > 4 2.73000 osd.4 up 1.00000 1.00000 > 0 2.73000 osd.0 up 1.00000 1.00000 > 2 2.73000 osd.2 up 1.00000 1.00000 > This is just one host, are they all like that? Full osd tree and even more importantly a "ceph -s" output. Ceph isn't particular good at creating an even distribution, but if you have too little PGs it gets worse, which would be my first suspicion here. > > What would be the best way to correct the issue without having > significant impact on the cluster IO? > Again, read the thread above. Increasing the PG count (if that is part of your problem) will have massive impact, but it needs to be done at some point. Re-weighting (CRUSH, permanently) OSDs in small increments/decrements (you want to keep the host weight more or less the same) of course also causes data movement, but done right (see the thread) the impact can be minimized. Christian > Many thanks > > Andrei -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com