On Mon, Apr 20, 2015 at 2:34 PM, Colin Corr <colin@xxxxxxxxxxxxx> wrote:
On 04/20/2015 11:02 AM, Robert LeBlanc wrote:
> We have a similar issue, but we wanted three copies across two racks. Turns out, that we increased size to 4 and left min_size at 2. We didn't want to risk having less than two copies and if we only had thee copies, losing a rack would block I/O. Once we expand to a third rack, we will adjust our rule and go to size 3. Searching the mailing list and docs proved difficult, so I'll include my rule so that you can use it as a basis. You should be able to just change rack to host and host to osd. If you want to keep only three copies, the "extra" OSD chosen just won't be used as Gregory mentions. Technically this rule should have "max_size 4", but I won't set a pool over 4 copies so I didn't change it here.
>
> If anyone has a better way of writing this rule (or one that would work for both a two rack and 3+ rack configuration as mentioned above), I'd be open to it. This is the first rule that I've really wrote on my own.
>
> rule replicated_ruleset {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step choose firstn 2 type rack
> step chooseleaf firstn 2 type host
> step emit
> }
Thank you Robert. Your example was very helpful. I didn't realize you could nest the choose and chooseleaf steps together. I thought chooseleaf effectively handled that for you already. This makes a bit more sense now.
I'm still a little fuzzy on it myself as well, but by not having an emit step between the choose and chooseleaf makes chooseleaf operate on the items chosen by choose instead of picking new things from all available entities. I couldn't get crushtool --test --simulate to work properly to confirm (http://tracker.ceph.com/issues/11224), but it is working properly in our cluster. Just FYI, the min_size and max_size does not change your pools, it only specifies what sizes the rule works for. Technically if the pool size (replica size) is less than 2 or greater than 3, this rule would not be selected.
My rule looks like this now:
rule host_rule {
ruleset 2
type replicated
min_size 2
max_size 3
step take default
step choose firstn 2 type host
step chooseleaf firstn 2 type osd
step emit
}
And the cluster is reporting the pool as clean, finally. If I understand correctly, we will now potentially have as many as 4 replicas of an object in the pool, 2 on each host.
You will only have 4 replicas if you set the size of your pool to 4, otherwise if it is the default, it will be three. The rule will support up to 4 replicas.
> On Mon, Apr 20, 2015 at 11:50 AM, Gregory Farnum <greg@xxxxxxxxxxx <mailto:greg@xxxxxxxxxxx>> wrote:
> It's actually pretty hacky: you configure your CRUSH rule to return
> two OSDs from each host, but set your size to 3. You'll want to test
> this carefully with your installed version to make sure that works,
> though — older CRUSH implementations would crash if you did that. :(
>
> In slightly more detail, you'll need to change it so that instead of
> using "chooseleaf" you "choose" 2 hosts, and then choose or chooseleaf
> 2 OSDs from each of those hosts. If you search the list archives for
> CRUSH threads you'll find some other discussions about doing precisely
> this, and I think the CRUSH documentation should cover the more
> general bits of how the language works.
> -Greg
Thank you Greg, I had trouble searching for discussions related to this. The Google was not being friendly, or I wasn't issuing a good query. My understanding of choose vs. chooseleaf and using multiple choose~ steps in a rule will send me back to the docs for the remainder of my day.
Thanks,
Colin
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com