Hi Patrick,
We actually already tried that before (get three from racks, then the 4th
from hosts). What we ended up with was a pg that was supposed to go
on the same osd twice … so we rolled back in the end :)
Then we thought it might be better to ask how to do this as the scenario
described shouldn't be an exotic use case.
Cheers,
Arne
Hey Arne,
So I am not one of the CRUSH-wizards by any means, but while we are
waiting for them I wanted to take a crack at it so you weren't left
hanging. You are able to make more complex choices than just a single
chooseleaf statment in your rules. Take the example from the doc
where you want one copy on an SSD and one on platter:
http://ceph.com/docs/master/rados/operations/crush-map/
So you can either try to build a "do this N times and put the N-x into
this other place (even if it's just the same hosts), or you could just
have it iterate at the host level instead of the rack level. Perhaps
the CRUSH wizards can give you a more elegant solution when they wake
up, but I figured this might get you started down a road to play with.
Shout if you have questions, and good luck!
Best Regards,
Patrick McGarry
Director, Community || Inktank
http://ceph.com || http://inktank.com
@scuttlemonkey || @ceph || @inktank
On Tue, Mar 19, 2013 at 5:55 AM, Arne Wiebalck <Arne.Wiebalck@xxxxxxx> wrote:
Hi all,
We're trying to spread data in our ceph cluster as much as possible,
that is pick different racks, then different hosts, then different OSDs.
It seems to work fine as long as there are enough buckets available,
but if we ask for more replicas than we have racks, for instance, the
requested number of replicas is not achieved:
For example, what we've seen is that with a replication size of 4, a rule
like
rule metadata {
ruleset 1
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type rack
step emit
}
and only 3 (!) racks we get only 3 replicas, like
osdmap e3078 pg 1.ba (1.ba) -> up [116,37,161] acting [116,37,161]
What we'd like is basically that crush tries to find 4 different racks (for
the 4
replicas), and if it finds only 3, pick 4 different hosts across the 3
racks.
Is there an easy way to do this?
BTW, health is OK despite having not enough replicas for the pool. What's
the best way to detect such situations where the actual state deviates from
the desired state?
TIA,
Arne
--
Arne Wiebalck
CERN IT
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|