Dear Eugen, I guess what you are suggesting is something like k+m with m>=k+2, for example k=4, m=6. Then, one can distribute 5 shards per DC and sustain the loss of an entire DC while still having full access to redundant storage. Now, a long time ago I was in a lecture about error-correcting codes (Reed-Solomon codes). From what I remember, the computational complexity of these codes explodes at least exponentially with m. Out of curiosity, how does m>3 perform in practice? What's the CPU requirement per OSD? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Eugen Block <eblock@xxxxxx> Sent: 27 March 2020 08:33:45 To: ceph-users@xxxxxxx Subject: Re: Combining erasure coding and replication? Hi Brett, > Our concern with Ceph is the cost of having three replicas. Storage > may be cheap but I’d rather not buy ANOTHER 5pb for a third replica > if there are ways to do this more efficiently. Site-level redundancy > is important to us so we can’t simply create an erasure-coded volume > across two buildings – if we lose power to a building, the entire > array would become unavailable. can you elaborate on that? Why is EC not an option? We have installed several clusters with two datacenters resilient to losing a whole dc (and additional disks if required). So it's basically the choice of the right EC profile. Or did I misunderstand something? Zitat von Brett Randall <brett.randall@xxxxxxxxx>: > Hi all > > Had a fun time trying to join this list, hopefully you don’t get > this message 3 times! > > On to Ceph… We are looking at setting up our first ever Ceph cluster > to replace Gluster as our media asset storage and production system. > The Ceph cluster will have 5pb of usable storage. Whether we use it > as object-storage, or put CephFS in front of it, is still TBD. > > Obviously we’re keen to protect this data well. Our current Gluster > setup utilises RAID-6 on each of the nodes and then we have a single > replica of each brick. The Gluster bricks are split between > buildings so that the replica is guaranteed to be in another > premises. By doing it this way, we guarantee that we can have a > decent number of disk or node failures (even an entire building) > before we lose both connectivity and data. > > Our concern with Ceph is the cost of having three replicas. Storage > may be cheap but I’d rather not buy ANOTHER 5pb for a third replica > if there are ways to do this more efficiently. Site-level redundancy > is important to us so we can’t simply create an erasure-coded volume > across two buildings – if we lose power to a building, the entire > array would become unavailable. Likewise, we can’t simply have a > single replica – our fault tolerance would drop way down on what it > is right now. > > Is there a way to use both erasure coding AND replication at the > same time in Ceph to mimic the architecture we currently have in > Gluster? I know we COULD just create RAID6 volumes on each node and > use the entire volume as a single OSD, but that this is not the > recommended way to use Ceph. So is there some other way? > > Apologies if this is a nonsensical question, I’m still trying to > wrap my head around Ceph, CRUSH maps, placement rules, volume types, > etc etc! > > TIA > > Brett > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx