Hi Christian, all, Having researched this a bit more, it seemed that just doing ceph osd pool set rbd pg_num 128 ceph osd pool set rbd pgp_num 128 might be the answer. Alas, it was not. After running the above the cluster just sat there. Finally, reading some more, I ran: ceph osd reweight-by-utilization This accomplished moving the utilization of the first drive on the affected node to the 2nd drive! .e.g.: ------- BEFORE RUNNING: ------- Filesystem Use% /dev/sdc1 57% /dev/sdb1 65% Filesystem Use% /dev/sdc1 90% /dev/sdb1 75% Filesystem Use% /dev/sdb1 52% /dev/sdc1 52% Filesystem Use% /dev/sdc1 54% /dev/sdb1 63% ------- AFTER RUNNING: ------- Filesystem Use% /dev/sdc1 57% /dev/sdb1 65% Filesystem Use% /dev/sdc1 70% ** these two swapped (roughly) ** /dev/sdb1 92% ** ^^^^^ ^^^ ^^^^^^^ ** Filesystem Use% /dev/sdb1 52% /dev/sdc1 52% Filesystem Use% /dev/sdc1 54% /dev/sdb1 63% root at osd45:~# ceph osd tree # id weight type name up/down reweight -1 3.44 root default -2 0.86 host osd45 0 0.43 osd.0 up 1 4 0.43 osd.4 up 1 -3 0.86 host osd42 1 0.43 osd.1 up 1 5 0.43 osd.5 up 1 -4 0.86 host osd44 2 0.43 osd.2 up 1 6 0.43 osd.6 up 1 -5 0.86 host osd43 3 0.43 osd.3 up 1 7 0.43 osd.7 up 0.7007 So this isn't the answer either. Could someone please chime in with an explanation/suggestion? I suspect that might make sense to use 'ceph osd reweight osd.7 1' and then run some form of 'ceph osd crush ...'? Of course, I've read a number of things which suggest that the two things I've done should have fixed my problem. Is it (gasp!) possible that this, as Christian suggests, is a dumpling issue and, were I running on firefly, it would be sufficient? Thanks much JR On 9/8/2014 1:50 PM, JR wrote: > Hi Christian, > > I have 448 PGs and 448 PGPs (according to ceph -s). > > This seems borne out by: > > root at osd45:~# rados lspools > data > metadata > rbd > volumes > images > root at osd45:~# for i in $(rados lspools); do echo "$i pg($(ceph osd pool > get $i pg_num), pgp$(ceph osd pool get $i pg_num)"; done > data pg(pg_num: 64, pgppg_num: 64 > metadata pg(pg_num: 64, pgppg_num: 64 > rbd pg(pg_num: 64, pgppg_num: 64 > volumes pg(pg_num: 128, pgppg_num: 128 > images pg(pg_num: 128, pgppg_num: 128 > > According to the formula discussed in 'Uneven OSD usage,' > > "The formula is actually OSDs * 100 / replication > > in my case: > > 8*100/2=400 > > So I'm erroring on the large size? > > Or, does this formula apply on by pool basis? Of my 5 pools I'm using 3: > > root at osd45:~# rados df|cut -c1-45 > pool name category KB > data - 0 > images - 0 > metadata - 10 > rbd - 568489533 > volumes - 594078601 > total used 2326235048 285923 > total avail 1380814968 > total space 3707050016 > > So should I up the number of PGs for the rbd and volumes pools? > > I'll continue looking at docs, but for now I'll send this off. > > Thanks very much, Christain. > > ps. This cluster is self-contained and all nodes in it are completely > loaded (i.e., I can't add any more nodes nor disks). It's also not an > option at the moment to upgrade to firefly (can't make a big change > before sending it out the door). > > > > On 9/8/2014 12:09 PM, Christian Balzer wrote: >> >> Hello, >> >> On Mon, 08 Sep 2014 11:42:59 -0400 JR wrote: >> >>> Greetings all, >>> >>> I have a small ceph cluster (4 nodes, 2 osds per node) which recently >>> started showing: >>> >>> root at ocd45:~# ceph health >>> HEALTH_WARN 1 near full osd(s) >>> >>> admin at node4:~$ for i in 2 3 4 5; do sudo ssh osd4$i df -h |egrep >>> 'Filesystem|osd/ceph'; done >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/sdc1 442G 249G 194G 57% /var/lib/ceph/osd/ceph-5 >>> /dev/sdb1 442G 287G 156G 65% /var/lib/ceph/osd/ceph-1 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/sdc1 442G 396G 47G 90% /var/lib/ceph/osd/ceph-7 >>> /dev/sdb1 442G 316G 127G 72% /var/lib/ceph/osd/ceph-3 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/sdb1 442G 229G 214G 52% /var/lib/ceph/osd/ceph-2 >>> /dev/sdc1 442G 229G 214G 52% /var/lib/ceph/osd/ceph-6 >>> Filesystem Size Used Avail Use% Mounted on >>> /dev/sdc1 442G 238G 205G 54% /var/lib/ceph/osd/ceph-4 >>> /dev/sdb1 442G 278G 165G 63% /var/lib/ceph/osd/ceph-0 >>> >>> >> See the very recent "Uneven OSD usage" for a discussion about this. >> What are your PG/PGP values? >> >>> This cluster has been running for weeks, under significant load, and has >>> been 100% stable. Unfortunately we have to ship it out of the building >>> to another part of our business (where we will have little access to it). >>> >>> Based on what I've read about 'ceph osd reweight' I'm a bit hesitant to >>> just run it (I don't want to do anything that impacts this cluster's >>> stability). >>> >>> Is there another, better way to equalize the distribution the data on >>> the osd partitions? >>> >>> I'm running dumpling. >>> >> As per the thread and my experience, Firefly would solve this. If you can >> upgrade during a weekend or whenever there is little to no access, do it. >> >> Another option (of course any and all of these will result in data >> movement, so pick an appropriate time), would be to "use ceph osd >> reweight" to lower the weight of osd.7 in particular. >> >> Lastly, given the utilization of your cluster, your really ought to deploy >> more OSDs and/or more nodes, if a node would go down you'd easily get into >> a "real" near full or full situation. >> >> Regards, >> >> Christian >> > -- Your electronic communications are being monitored; strong encryption is an answer. My public key <http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x4F08C504BD634953>