On Mon, 20 Jan 2014, Arnulf Heimsbakk wrote: > Hi, > > I'm trying to understand the CRUSH algorithm and how it distribute data. > Let's say I simplify a small datacenter setup and map it up > hierarchically in the crush map as show below. > > root datacenter > / \ > / \ > / \ > a b room > / | \ / | \ > a1 a2 a3 b1 b2 b3 rack > | | | | | | > h1 h2 h3 h4 h5 h6 host > > I want 4 copies of all data in my pool, configured on pool level. 2 > copies in each room. And I want to be sure not 2 copies resides in the > same rack when there is no HW failures. > > Will the chooseleaf rule below ensure this placement? > > step take root > step chooseleaf firstn 0 type room > step emit This won't ensure the 2 copies in each room are in different racks. > Or do I have to specify this more, like > > step take root > step choose firstn 2 type room > step chooseleaf firstn 2 type rack > step emit I think this is what you want. The thing it won't do is decide to put 4 replicas in room b when room a goes down completely... but at that scale, that is generally not what you want anyway. > Or even more, like? > > step take a > step choose firstn 2 type rack > step chooseleaf firstn 1 type host > step emit > step take b > step choose firstn 2 type rack > step chooseleaf firstn 1 type host > step emit > > Is there difference in failure behaviour in the different configurations? This would work too, but assumes you only have 2 rooms, and that you always want the primary copy to be in room a (which means the reads go there). The previous rule with spread the primary responsibility across both rooms. sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com