On Wed, 4 May 2011, Zenon Panoussis wrote: > On 05/04/2011 08:21 PM, Sage Weil wrote: > > >> does "min_size 2, max_size 2" mean that I want "2 copies of the data on each > >> host" or "2 copies of the data in total in the entire cluster"? > > > Neither, actually. It means that this rule will be used when we ask crush > > for ruleset 0 and 2 replicas. If you change a pg to have 3x replication, > > ceph will ask for ruleset 0 and 3 replicas, and this rule won't be used. > > In other words, the total number of replicas in the cluster is determined on > the PG level? But then how do I control which PGs are physically stored where? > > > You probably want min_size 1 and max_size 10. > > Taking what you just wrote together with a re-reading of the wiki, I must admit > that I still don't quite grasp it. The wiki says > > That is, when placing object replicas, we start at the root hierarchy, and > choose N items of type 'device'. ('0' means to grab however many replicas. > The rules are written to be general for some range of N, 1-10 in this case.) > > What I make out of all this is that > > rule data { > ruleset 0 > type replicated > min_size 1 > max_size 10 > step take root > step choose firstn 0 type device > step emit > } > > means that IF the PGs are set to create anything between 1 and 10 replicas, then > the replicas should be placed on devices, using an unlimited number of devices. > > Is that correct? > > My problem really is how to configure ceph to put exactly 1 replica of the data > (and metadata) on each and every of some kind of target. For example, if I have > 10 racks, I want exactly 1 copy of the data in each rack, no more, no less (and > I don't care which host in that rack the data lands on). If I have 10 hosts, > I want exactly 1 copy of the data on each host (and I don't care which OSD on > that host the data lands on). If I only have 10 OSDs, I want exactly 1 copy of > the data on each and every OSD. > > Assuming that the number of targets is fixed and known, what is the way to do > this? Yes. So the rule you have is right (at least up to 10 nodes). Then you need to set the pg_size (aka replication level) for the pools you care about. For 5x, that's ceph osd pool set data size 4 You can see the current sizes with ceph osd dump -o - | grep pool and look at pg_size attribute. > And going back to PGs, if "ceph osd dump -o -|grep pg_size" says > > pg_pool 0 'data' pg_pool(rep pg_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 lpg_num 2 lpgp_num 2 last_change 66 owner 0) > > and "ceph -w" says > > pg v319405: 528 pgs: 528 active+clean; 22702 MB data, 77093 MB used, 346 GB / 446 GB avail > > how do the 128 PGs of "ceph osd dump" relate to the 528 PGs of "ceph -w"? There are several different pools, each sliced into many pgs. > As an aside, I think that, to a certain extent, improving the > documentation could contribute more to the code base than improving the > actual code. You guys spend a lot of time answering the kind of > questions that I've been posing (and thank you for doing so), while at > the same time missing out on the debugging help you could be getting > instead if your user base could move past its trivial problems. If I > were your scrum master, I'd dedicate an entire sprint on the wiki alone. The replication is covered by http://ceph.newdream.net/wiki/Adjusting_replication_level Any specific suggestions on how that should be improved? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html