Re: Combining erasure coding and replication?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Brett,

So how far apart are your buildings and what is the network connectivity between the buildings? I am going to assume they are close and you have lots of bandwidth.

There are a couple of options depending on the protocol and the distance between the buildings.

You could build an EC cluster with something like 4:6 so 4 data pieces and 6 parity pieces (Assuming you have 5 nodes in each DC).

With this setup you can then have a failure of an entire DC and still have access to your data with protection. This is basically achieved by building the correct crush map rules which place half the data in one DC and the other half in the other DC.

You would need to think about where you would put a third monitor in this case.

The down side of this is you could be reading data from either DC. Not sure where your workloads are.

There is another alternative of this which is to use LRC which creates the ability to rebuild the data with a DC this helps when it comes to rebuilds but doesn't help with where to read data from.

The other option would be replication. So build two separate clusters and you can configure S3 to replicate to the second site. Or setup rsync to replicate if using CephFS, not pretty but an option.

Darren



From: Brett Randall <brett.randall@xxxxxxxxx>
Date: Wednesday, 10 June 2020 at 15:20
To: ceph-users@xxxxxxx <ceph-users@xxxxxxx>
Subject:  Combining erasure coding and replication?
Hi all

We are looking at setting up our first ever Ceph cluster to replace Gluster as our media asset storage and production system. The Ceph cluster will have 5pb of usable storage. Whether we use it as object-storage, or put CephFS in front of it, is still TBD.

Obviously we’re keen to protect this data well. Our current Gluster setup utilises RAID-6 on each of the nodes and then we have a single replica of each brick. The Gluster bricks are split between buildings so that the replica is guaranteed to be in another premises. By doing it this way, we guarantee that we can have a decent number of disk or node failures (even an entire building) before we lose both connectivity and data.

Our concern with Ceph is the cost of having three replicas. Storage may be cheap but I’d rather not buy ANOTHER 5pb for a third replica if there are ways to do this more efficiently. Site-level redundancy is important to us so we can’t simply create an erasure-coded volume across two buildings – if we lose power to a building, the entire array would become unavailable. Likewise, we can’t simply have a single replica – our fault tolerance would drop way down on what it is right now.

Is there a way to use both erasure coding AND replication at the same time in Ceph to mimic the architecture we currently have in Gluster? I know we COULD just create RAID6 volumes on each node and use the entire volume as a single OSD, but that this is not the recommended way to use Ceph. So is there some other way?

Apologies if this is a nonsensical question, I’m still trying to wrap my head around Ceph, CRUSH maps, placement rules, volume types, etc etc!

TIA

Brett

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux