On 04/20/2015 01:46 PM, Robert LeBlanc wrote: > > > On Mon, Apr 20, 2015 at 2:34 PM, Colin Corr <colin@xxxxxxxxxxxxx <mailto:colin@xxxxxxxxxxxxx>> wrote: > > > > On 04/20/2015 11:02 AM, Robert LeBlanc wrote: > > We have a similar issue, but we wanted three copies across two racks. Turns out, that we increased size to 4 and left min_size at 2. We didn't want to risk having less than two copies and if we only had thee copies, losing a rack would block I/O. Once we expand to a third rack, we will adjust our rule and go to size 3. Searching the mailing list and docs proved difficult, so I'll include my rule so that you can use it as a basis. You should be able to just change rack to host and host to osd. If you want to keep only three copies, the "extra" OSD chosen just won't be used as Gregory mentions. Technically this rule should have "max_size 4", but I won't set a pool over 4 copies so I didn't change it here. > > > > If anyone has a better way of writing this rule (or one that would work for both a two rack and 3+ rack configuration as mentioned above), I'd be open to it. This is the first rule that I've really wrote on my own. > > > > rule replicated_ruleset { > > ruleset 0 > > type replicated > > min_size 1 > > max_size 10 > > step take default > > step choose firstn 2 type rack > > step chooseleaf firstn 2 type host > > step emit > > } > > Thank you Robert. Your example was very helpful. I didn't realize you could nest the choose and chooseleaf steps together. I thought chooseleaf effectively handled that for you already. This makes a bit more sense now. > > > I'm still a little fuzzy on it myself as well, but by not having an emit step between the choose and chooseleaf makes chooseleaf operate on the items chosen by choose instead of picking new things from all available entities. I couldn't get crushtool --test --simulate to work properly to confirm (http://tracker.ceph.com/issues/11224), but it is working properly in our cluster. Just FYI, the min_size and max_size does not change your pools, it only specifies what sizes the rule works for. Technically if the pool size (replica size) is less than 2 or greater than 3, this rule would not be selected. Thanks for the help. Reading your comments and re-reading the documentation is helpful in understanding how the rule language works. I had a few misconceptions. Any thoughts as to what conditions would cause us to end up with more than the specified number of replicas? Is it for recovery scenarios or like a safety rail for flapping OSDs? It would seem that the default min_size and max_size values (1 and 10) are sufficient for this rule, just as you demonstrated in your rule. rule host_rule { ruleset 2 type replicated min_size 1 max_size 10 step take default step choose firstn 2 type host step chooseleaf firstn 2 type osd step emit } > My rule looks like this now: > rule host_rule { > ruleset 2 > type replicated > min_size 2 > max_size 3 > step take default > step choose firstn 2 type host > step chooseleaf firstn 2 type osd > step emit > } > > And the cluster is reporting the pool as clean, finally. If I understand correctly, we will now potentially have as many as 4 replicas of an object in the pool, 2 on each host. > > > You will only have 4 replicas if you set the size of your pool to 4, otherwise if it is the default, it will be three. The rule will support up to 4 replicas. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com