Re: Replicating between two datacenters without decompiling CRUSH map

Torsten Casselt <casselt@xxxxxxxxxxxxxxxxxxxx> · Thu, 16 Aug 2018 11:21:30 +0200

Hi Paul,

thanks for the detailed answer!

Am Dienstag, den 14.08.2018, 12:23 +0200 schrieb Paul Emmerich:
> IIRC this will create a rule that tries to selects n independent data
> centers
> Check the actual generated rule to validate this.

This is what it did and looking back it makes sense. ;) My
understanding of the CRUSH failure domain implementation changed daily
while changing the rules and at the time of writing to the ceph-users
list my understanding of it was wrong.

> I think the only way to express "3 copies across two data centers" is
> by explicitly
> using the two data centers in the rule as in:
> 
> (pseudo code)
> take dc1
> chooseleaf 1 type host
> emit
> take dc2
> chooseleaf 2 type host
> emit
> 
> Which will always place 1 on dc1 and 2 in dc2. A rule like 
> 
> take default
> choose 2 type datacenter
> chooseleafe 2 type host
> emit
> 
> will select a total of 4 hosts in two different data centers (2 hosts
> per dc)

This is how I solved it in the end, my question was targeted at doing
this without manually editing the CRUSH map. It’s okay that it does not
work, but I’d rather ask if I can keep it simple. I used your second
approach:

        step take default
        step choose firstn 0 type datacenter
        step chooseleaf firstn 2 type host
        step emit

This way this will work if another datacenter is added (wishful
thinking) and replication is increased from four to six without
changing the CRUSH rule.

> But the real problem here is that 2 data centers in one Ceph cluster
> is just
> a poor fit for Ceph in most scenarios. 3 would be fine. Two
> independent
> clusters and async rbd-mirror or rgw synchronization would also be
> fine.
> 
> But one cluster in two data centers and replicating via CRUSH just
> isn't
> how it works.
> Maybe you are looking for something like "3 independent racks" and
> you happen
> to have two racks in each dc? Really depends on your setup and
> requirements.

Let me explain the setup a bit: I have two datacenters, each with one
rack that has three nodes in it. I run with four copies replicated, so
I have two copies in each datacenter and one can burn down without
problems. I am aware of the monitor quorum problem. Yes, the storage is
down if the “wrong” datacenter burns down. Still we just don’t have a
third and it is not as important to stay online in disaster, we can
recover manually. What matters most is that the data survives.

Thanks again for your input, highly appreciated!
Torsten

-- 
Torsten Casselt, IT-Sicherheit, Leibniz Universität IT Services
Tel: +49-(0)511-762-799095      Schlosswender Str. 5
Fax: +49-(0)511-762-3003        D-30159 Hannover
Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com