Re: Rules for optimal data placement

Patrick McGarry <patrick@xxxxxxxxxxx> · Tue, 19 Mar 2013 10:09:30 -0400

Hey Arne,

So I am not one of the CRUSH-wizards by any means, but while we are
waiting for them I wanted to take a crack at it so you weren't left
hanging.  You are able to make more complex choices than just a single
chooseleaf statment in your rules.  Take the example from the doc
where you want one copy on an SSD and one on platter:

http://ceph.com/docs/master/rados/operations/crush-map/

So you can either try to build a "do this N times and put the N-x into
this other place (even if it's just the same hosts), or you could just
have it iterate at the host level instead of the rack level.  Perhaps
the CRUSH wizards can give you a more elegant solution when they wake
up, but I figured this might get you started down a road to play with.
 Shout if you have questions, and good luck!

Best Regards,

Patrick McGarry
Director, Community || Inktank

http://ceph.com  ||  http://inktank.com
@scuttlemonkey || @ceph || @inktank

On Tue, Mar 19, 2013 at 5:55 AM, Arne Wiebalck <Arne.Wiebalck@xxxxxxx> wrote:
> Hi all,
>
> We're trying to spread data in our ceph cluster as much as possible,
> that is pick different racks, then different hosts, then different OSDs.
> It seems to work fine as long as there are enough buckets available,
> but if we ask for more replicas than we have racks, for instance, the
> requested number of replicas is not achieved:
>
> For example, what we've seen is that with a replication size of 4, a rule
> like
>
> rule metadata {
>         ruleset 1
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type rack
>         step emit
> }
>
> and only 3 (!) racks we get only 3 replicas, like
>
> osdmap e3078 pg 1.ba (1.ba) -> up [116,37,161] acting [116,37,161]
>
> What we'd like is basically that crush tries to find 4 different racks (for
> the 4
> replicas), and if it finds only 3, pick 4 different hosts across the 3
> racks.
>
> Is there an easy way to do this?
>
> BTW, health is OK despite having not enough replicas for the pool. What's
> the best way to detect such situations where the actual state deviates from
> the desired state?
>
> TIA,
>  Arne
>
> --
> Arne Wiebalck
> CERN IT
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com