Hello, On Mon, 08 Sep 2014 11:42:59 -0400 JR wrote: > Greetings all, > > I have a small ceph cluster (4 nodes, 2 osds per node) which recently > started showing: > > root at ocd45:~# ceph health > HEALTH_WARN 1 near full osd(s) > > admin at node4:~$ for i in 2 3 4 5; do sudo ssh osd4$i df -h |egrep > 'Filesystem|osd/ceph'; done > Filesystem Size Used Avail Use% Mounted on > /dev/sdc1 442G 249G 194G 57% /var/lib/ceph/osd/ceph-5 > /dev/sdb1 442G 287G 156G 65% /var/lib/ceph/osd/ceph-1 > Filesystem Size Used Avail Use% Mounted on > /dev/sdc1 442G 396G 47G 90% /var/lib/ceph/osd/ceph-7 > /dev/sdb1 442G 316G 127G 72% /var/lib/ceph/osd/ceph-3 > Filesystem Size Used Avail Use% Mounted on > /dev/sdb1 442G 229G 214G 52% /var/lib/ceph/osd/ceph-2 > /dev/sdc1 442G 229G 214G 52% /var/lib/ceph/osd/ceph-6 > Filesystem Size Used Avail Use% Mounted on > /dev/sdc1 442G 238G 205G 54% /var/lib/ceph/osd/ceph-4 > /dev/sdb1 442G 278G 165G 63% /var/lib/ceph/osd/ceph-0 > > See the very recent "Uneven OSD usage" for a discussion about this. What are your PG/PGP values? > This cluster has been running for weeks, under significant load, and has > been 100% stable. Unfortunately we have to ship it out of the building > to another part of our business (where we will have little access to it). > > Based on what I've read about 'ceph osd reweight' I'm a bit hesitant to > just run it (I don't want to do anything that impacts this cluster's > stability). > > Is there another, better way to equalize the distribution the data on > the osd partitions? > > I'm running dumpling. > As per the thread and my experience, Firefly would solve this. If you can upgrade during a weekend or whenever there is little to no access, do it. Another option (of course any and all of these will result in data movement, so pick an appropriate time), would be to "use ceph osd reweight" to lower the weight of osd.7 in particular. Lastly, given the utilization of your cluster, your really ought to deploy more OSDs and/or more nodes, if a node would go down you'd easily get into a "real" near full or full situation. Regards, Christian -- Christian Balzer Network/Systems Engineer chibi at gol.com Global OnLine Japan/Fusion Communult in data movement, so pick an appropriate time), would be to ications http://www.gol.com/