Re: Stretch Clusters Mode

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 28 Oct 2021 13:30:16 -0700

On Thu, Oct 28, 2021 at 8:34 AM Mykola Golub <to.my.trociny@xxxxxxxxx> wrote:
>
> Hi,
>
> I have questions about the "stretch mode" feature [1].
>
> 1) In the limitations section [2] it is stated that EC pools support
> is not implemented yet. Does anyone know what things are missing?  I
> understand that there should be a restriction on K+M values, I suppose
> only profiles with M >= K+2 should be allowed so we would have at
> least K+1 shards on every site.

I think there were some other incomplete items which prevented me from
wanting to support EC in stretch clusters, but unfortunately I can't
remember what they were off-hand.
It might have just been utter lack of confidence I could correctly set
up peering minimum requirements for EC pools, though — replication
just needs a peer in each site, but EC needs to guarantee
recoverability in each site and that's not a trivial problem in
general. You could try and gate peering on the recoverable check
function provided by the EC plugin, but I'm not sure if that's the
only thing needed.

It does not surprise me that things *seem* to work at a basic level,
though — I definitely was aiming to make the data structures and logic
future-compatible with more than 2 sites or the use of erasure coding!

> And for 2+4 profile the rule could be:
>
> rule stretch_rule {
>         id 1
>         type replicated
>         step take site1
>         step chooseleaf indep 3 type host
>         step emit
>         step take site2
>         step chooseleaf indep 3 type host
>         step emit
> }
>
> This is based on the example for a replicated pool provided in the
> stretch mode documentation.
>
> And I just tried, and to make it (apparently) work I just had to
> remove the EC pool restrictions in the code. So I wonder what I am
> missing.
>
> 2) Looking at the example for a replicated pool in the doc [1] (which
> I used to make a rule for replicated pool):
>
> rule stretch_rule {
>         id 1
>         type replicated
>         step take site1
>         step chooseleaf firstn 2 type host
>         step emit
>         step take site2
>         step chooseleaf firstn 2 type host
>         step emit
> }
>
> With this rule the primary is alway on the site1 (and the same problem
> is with my ec rule BTW), which does not look like perfect, i.e. the
> reads will always go to osds on site1. Is it a known limitation?

The CRUSH rule can be whatever you want it to be, as long as it
provides two copies in each site. But keeping the primaries on one
side has a lot of utility if you are running it with the expectation
of doing live failover in the case of a disaster: if all your live
services run in DC1, you can make sure they serve reads out of the
local data center, and do not have to send writes in *both*
directions.
-Greg

>
> [1] https://docs.ceph.com/en/latest/rados/operations/stretch-mode/
> [2] https://docs.ceph.com/en/latest/rados/operations/stretch-mode/#stretch-mode-limitations
>
> --
> Mykola Golub
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
>

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx