On Wed, Aug 22, 2018 at 12:56 AM Konstantin Shalygin <k0ste@xxxxxxxx> wrote:
> Hi everyone,
>
> I read an earlier thread [1] that made a good explanation on the 'step
> choose|chooseleaf' option. Could someone further help me to understand
> the 'firstn|indep' part? Also, what is the relationship between 'step
> take' and 'step choose|chooseleaf' when it comes to define a failure
> domain?
>
> Thank you very much.
This documented on CRUSH Map Rules [1]
[1]
http://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#crush-map-rules
But that doesn't seem to really discuss it, and I don't see it elsewhere in our docs either. So:
"indep" and "firstn" are two different strategies for selecting items (mostly, OSDs) in a CRUSH hierarchy. If you're storing EC data you want to use indep; if you're storing replicated data you want to use firstn.
The reason has to do with how they behave when a previously-selected devices fails. Let's say you have a PG stored on OSDs 1, 2, 3, 4, 5. Then 3 goes down.
With the "firstn" mode, CRUSH simply adjusts its calculation in a way that it selects 1 and 2, then selects 3 but discovers it's down, so it retries and selects 4 and 5, and then goes on to select a new OSD 6. So the final CRUSH mapping change is
1, 2, 3, 4, 5 -> 1, 2, 4, 5, 6.
But if you're storing an EC pool, that means you just changed the data mapped to OSDs 4, 5, and 6! That's terrible! So the "indep" mode attempts to not do that. (It still *might* conflict, but the odds are much lower). You can instead expect it, when it selects the failed 3, to try again and pick out 6, for a final transformation of:
1, 2, 3, 4, 5 -> 1, 2, 6, 4, 5
-Greg
k
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com