How many PGs do you have in your pool? This should be about 100/OSD. If it is too low, you could get an imbalance. I don't know the consequence of changing it on such a full cluster. The default values are only good for small test environments. Robert LeBlanc Sent from a mobile device please excuse any typos. On Aug 28, 2014 11:00 AM, "J David" <j.david.lists at gmail.com> wrote: > Hello, > > Is there any way to provoke a ceph cluster to level out its OSD usage? > > Currently, a cluster of 3 servers with 4 identical OSDs each is > showing disparity of about 20% between the most-used OSD and the > least-used OSD. This wouldn't be too big of a problem, but the > most-used OSD is now at 86% (with the least-used at 72%). > > There are three more nodes on order but they are a couple of weeks > away. Is there anything I can do in the mean time to push existing > data (and new data) toward less-used OSD's? > > Reweighting the OSD's feels intuitively like the wrong approach since > they are all the same size and "should" have the same weight. Is that > the wrong intuition? > > Also, with a test cluster, I did try playing around with > reweight-by-utilization and it actually seemed to make things worse. > But that cluster was assembled from spare parts and the OSD's were > neither all the same size nor were they uniformly distributed between > servers. This is *not* a test cluster, so I am gun-shy about possibly > making things worse. > > Is reweight-by-utilization the right point to poke this? Or is there > a better tool in the toolbox for this situation? > > Here is the OSD tree showing that everything is weighted equally: > > # id weight type name up/down reweight > -1 4.2 root default > -2 1.4 host f13 > 0 0.35 osd.0 up 1 > 1 0.35 osd.1 up 1 > 2 0.35 osd.2 up 1 > 3 0.35 osd.3 up 1 > -3 1.4 host f14 > 4 0.35 osd.4 up 1 > 9 0.35 osd.9 up 1 > 10 0.35 osd.10 up 1 > 11 0.35 osd.11 up 1 > -4 1.4 host f15 > 5 0.35 osd.5 up 1 > 6 0.35 osd.6 up 1 > 7 0.35 osd.7 up 1 > 8 0.35 osd.8 up 1 > > And the df's of each: > > Node 1: > > /dev/sda2 358G 258G > 101G 72% /var/lib/ceph/osd/ceph-0 > /dev/sdb2 358G 294G > 65G 82% /var/lib/ceph/osd/ceph-1 > /dev/sdc2 358G 278G > 81G 78% /var/lib/ceph/osd/ceph-2 > /dev/sdd2 358G 294G > 65G 83% /var/lib/ceph/osd/ceph-3 > > Node 2: > > /dev/sda2 358G 285G > 73G 80% /var/lib/ceph/osd/ceph-5 > /dev/sdb2 358G 305G > 53G 86% /var/lib/ceph/osd/ceph-6 > /dev/sdc2 358G 301G > 58G 85% /var/lib/ceph/osd/ceph-7 > /dev/sdd2 358G 299G > 60G 84% /var/lib/ceph/osd/ceph-8 > > Node 3: > > /dev/sda2 358G 290G > 68G 82% /var/lib/ceph/osd/ceph-4 > /dev/sdb2 358G 297G > 62G 83% /var/lib/ceph/osd/ceph-11 > /dev/sdc2 358G 285G > 73G 80% /var/lib/ceph/osd/ceph-10 > /dev/sdd2 358G 306G > 53G 86% /var/lib/ceph/osd/ceph-9 > > Ideally we would like to get about 125 gigs more data (with num of > replicas set to 2) onto this pool before the additional nodes arrive, > which would put *everything* at about 86% if everything were evenly > balanced. But the way it's currently going, that'll have the busiest > OSD dangerously close to 95%. (Apparently data increases faster than > you expect, even if you account for this. :-P ) > > What's the best way forward? > > Thanks for any advice! > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140828/20e2f3a9/attachment.htm>