Re: Rules for optimal data placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Patrick,

We actually already tried that before (get three from racks, then the 4th
from hosts). What we ended up with was a pg that was supposed to go
on the same osd twice … so we rolled back in the end :)

Then we thought it might be better to ask how to do this as the scenario
described shouldn't be an exotic use case. 

Cheers,
 Arne

--
Arne Wiebalck
CERN IT

On Mar 19, 2013, at 3:09 PM, Patrick McGarry <patrick@xxxxxxxxxxx> wrote:

Hey Arne,

So I am not one of the CRUSH-wizards by any means, but while we are
waiting for them I wanted to take a crack at it so you weren't left
hanging.  You are able to make more complex choices than just a single
chooseleaf statment in your rules.  Take the example from the doc
where you want one copy on an SSD and one on platter:

http://ceph.com/docs/master/rados/operations/crush-map/

So you can either try to build a "do this N times and put the N-x into
this other place (even if it's just the same hosts), or you could just
have it iterate at the host level instead of the rack level.  Perhaps
the CRUSH wizards can give you a more elegant solution when they wake
up, but I figured this might get you started down a road to play with.
Shout if you have questions, and good luck!


Best Regards,


Patrick McGarry
Director, Community || Inktank

http://ceph.com  ||  http://inktank.com
@scuttlemonkey || @ceph || @inktank


On Tue, Mar 19, 2013 at 5:55 AM, Arne Wiebalck <Arne.Wiebalck@xxxxxxx> wrote:
Hi all,

We're trying to spread data in our ceph cluster as much as possible,
that is pick different racks, then different hosts, then different OSDs.
It seems to work fine as long as there are enough buckets available,
but if we ask for more replicas than we have racks, for instance, the
requested number of replicas is not achieved:

For example, what we've seen is that with a replication size of 4, a rule
like

rule metadata {
       ruleset 1
       type replicated
       min_size 1
       max_size 10
       step take default
       step chooseleaf firstn 0 type rack
       step emit
}

and only 3 (!) racks we get only 3 replicas, like

osdmap e3078 pg 1.ba (1.ba) -> up [116,37,161] acting [116,37,161]

What we'd like is basically that crush tries to find 4 different racks (for
the 4
replicas), and if it finds only 3, pick 4 different hosts across the 3
racks.

Is there an easy way to do this?

BTW, health is OK despite having not enough replicas for the pool. What's
the best way to detect such situations where the actual state deviates from
the desired state?

TIA,
Arne

--
Arne Wiebalck
CERN IT







_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux