Uneven OSD usage

robert@xxxxxxxxxxxxx (Robert LeBlanc) · Thu, 28 Aug 2014 17:00:23 -0600

How many PGs do you have in your pool? This should be about 100/OSD. If it
is too low, you could get an imbalance. I don't know the consequence of
changing it on such a full cluster. The default values are only good for
small test environments.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Aug 28, 2014 11:00 AM, "J David" <j.david.lists at gmail.com> wrote:

> Hello,
>
> Is there any way to provoke a ceph cluster to level out its OSD usage?
>
> Currently, a cluster of 3 servers with 4 identical OSDs each is
> showing disparity of about 20% between the most-used OSD and the
> least-used OSD.  This wouldn't be too big of a problem, but the
> most-used OSD is now at 86% (with the least-used at 72%).
>
> There are three more nodes on order but they are a couple of weeks
> away.  Is there anything I can do in the mean time to push existing
> data (and new data) toward less-used OSD's?
>
> Reweighting the OSD's feels intuitively like the wrong approach since
> they are all the same size and "should" have the same weight.  Is that
> the wrong intuition?
>
> Also, with a test cluster, I did try playing around with
> reweight-by-utilization and it actually seemed to make things worse.
> But that cluster was assembled from spare parts and the OSD's were
> neither all the same size nor were they uniformly distributed between
> servers.  This is *not* a test cluster, so I am gun-shy about possibly
> making things worse.
>
> Is reweight-by-utilization the right point to poke this?  Or is there
> a better tool in the toolbox for this situation?
>
> Here is the OSD tree showing that everything is weighted equally:
>
> # id weight type name up/down reweight
> -1 4.2 root default
> -2 1.4 host f13
> 0 0.35 osd.0 up 1
> 1 0.35 osd.1 up 1
> 2 0.35 osd.2 up 1
> 3 0.35 osd.3 up 1
> -3 1.4 host f14
> 4 0.35 osd.4 up 1
> 9 0.35 osd.9 up 1
> 10 0.35 osd.10 up 1
> 11 0.35 osd.11 up 1
> -4 1.4 host f15
> 5 0.35 osd.5 up 1
> 6 0.35 osd.6 up 1
> 7 0.35 osd.7 up 1
> 8 0.35 osd.8 up 1
>
> And the df's of each:
>
> Node 1:
>
> /dev/sda2                                               358G  258G
> 101G  72% /var/lib/ceph/osd/ceph-0
> /dev/sdb2                                               358G  294G
> 65G  82% /var/lib/ceph/osd/ceph-1
> /dev/sdc2                                               358G  278G
> 81G  78% /var/lib/ceph/osd/ceph-2
> /dev/sdd2                                               358G  294G
> 65G  83% /var/lib/ceph/osd/ceph-3
>
> Node 2:
>
> /dev/sda2                                               358G  285G
> 73G  80% /var/lib/ceph/osd/ceph-5
> /dev/sdb2                                               358G  305G
> 53G  86% /var/lib/ceph/osd/ceph-6
> /dev/sdc2                                               358G  301G
> 58G  85% /var/lib/ceph/osd/ceph-7
> /dev/sdd2                                               358G  299G
> 60G  84% /var/lib/ceph/osd/ceph-8
>
> Node 3:
>
> /dev/sda2                                               358G  290G
> 68G  82% /var/lib/ceph/osd/ceph-4
> /dev/sdb2                                               358G  297G
> 62G  83% /var/lib/ceph/osd/ceph-11
> /dev/sdc2                                               358G  285G
> 73G  80% /var/lib/ceph/osd/ceph-10
> /dev/sdd2                                               358G  306G
> 53G  86% /var/lib/ceph/osd/ceph-9
>
> Ideally we would like to get about 125 gigs more data (with num of
> replicas set to 2) onto this pool before the additional nodes arrive,
> which would put *everything* at about 86% if everything were evenly
> balanced.  But the way it's currently going, that'll have the busiest
> OSD dangerously close to 95%.  (Apparently data increases faster than
> you expect, even if you account for this. :-P )
>
> What's the best way forward?
>
> Thanks for any advice!
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140828/20e2f3a9/attachment.htm>