Re: Combining erasure coding and replication?

Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> · Fri, 27 Mar 2020 10:16:52 +0100

On 27/03/2020 09:56, Eugen Block wrote:
> Hi,
> 
>> I guess what you are suggesting is something like k+m with m>=k+2, for
>> example k=4, m=6. Then, one can distribute 5 shards per DC and sustain
>> the loss of an entire DC while still having full access to redundant
>> storage.
> 
> that's exactly what I mean, yes.

We have an EC pool of 5+7, which works that way. Currently we have no
demand for it, but it should do the job.

Cheers

/Simon

> 
>> Now, a long time ago I was in a lecture about error-correcting codes
>> (Reed-Solomon codes). From what I remember, the computational
>> complexity of these codes explodes at least exponentially with m. Out
>> of curiosity, how does m>3 perform in practice? What's the CPU
>> requirement per OSD?
> 
> Such a setup usually would be considered for archiving purposes so the
> performance requirements aren't very high, but so far we haven't heard
> any complaints performance-wise.
> I don't have details on CPU requirements at hand right now.
> 
> Regards,
> Eugen
> 
> 
> Zitat von Frank Schilder <frans@xxxxxx>:
> 
>> Dear Eugen,
>>
>> I guess what you are suggesting is something like k+m with m>=k+2, for
>> example k=4, m=6. Then, one can distribute 5 shards per DC and sustain
>> the loss of an entire DC while still having full access to redundant
>> storage.
>>
>> Now, a long time ago I was in a lecture about error-correcting codes
>> (Reed-Solomon codes). From what I remember, the computational
>> complexity of these codes explodes at least exponentially with m. Out
>> of curiosity, how does m>3 perform in practice? What's the CPU
>> requirement per OSD?
>>
>> Best regards,
>>
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Eugen Block <eblock@xxxxxx>
>> Sent: 27 March 2020 08:33:45
>> To: ceph-users@xxxxxxx
>> Subject:  Re: Combining erasure coding and replication?
>>
>> Hi Brett,
>>
>>> Our concern with Ceph is the cost of having three replicas. Storage
>>> may be cheap but I’d rather not buy ANOTHER 5pb for a third replica
>>> if there are ways to do this more efficiently. Site-level redundancy
>>> is important to us so we can’t simply create an erasure-coded volume
>>> across two buildings – if we lose power to a building, the entire
>>> array would become unavailable.
>>
>> can you elaborate on that? Why is EC not an option? We have installed
>> several clusters with two datacenters resilient to losing a whole dc
>> (and additional disks if required). So it's basically the choice of
>> the right EC profile. Or did I misunderstand something?
>>
>>
>> Zitat von Brett Randall <brett.randall@xxxxxxxxx>:
>>
>>> Hi all
>>>
>>> Had a fun time trying to join this list, hopefully you don’t get
>>> this message 3 times!
>>>
>>> On to Ceph… We are looking at setting up our first ever Ceph cluster
>>> to replace Gluster as our media asset storage and production system.
>>> The Ceph cluster will have 5pb of usable storage. Whether we use it
>>> as object-storage, or put CephFS in front of it, is still TBD.
>>>
>>> Obviously we’re keen to protect this data well. Our current Gluster
>>> setup utilises RAID-6 on each of the nodes and then we have a single
>>> replica of each brick. The Gluster bricks are split between
>>> buildings so that the replica is guaranteed to be in another
>>> premises. By doing it this way, we guarantee that we can have a
>>> decent number of disk or node failures (even an entire building)
>>> before we lose both connectivity and data.
>>>
>>> Our concern with Ceph is the cost of having three replicas. Storage
>>> may be cheap but I’d rather not buy ANOTHER 5pb for a third replica
>>> if there are ways to do this more efficiently. Site-level redundancy
>>> is important to us so we can’t simply create an erasure-coded volume
>>> across two buildings – if we lose power to a building, the entire
>>> array would become unavailable. Likewise, we can’t simply have a
>>> single replica – our fault tolerance would drop way down on what it
>>> is right now.
>>>
>>> Is there a way to use both erasure coding AND replication at the
>>> same time in Ceph to mimic the architecture we currently have in
>>> Gluster? I know we COULD just create RAID6 volumes on each node and
>>> use the entire volume as a single OSD, but that this is not the
>>> recommended way to use Ceph. So is there some other way?
>>>
>>> Apologies if this is a nonsensical question, I’m still trying to
>>> wrap my head around Ceph, CRUSH maps, placement rules, volume types,
>>> etc etc!
>>>
>>> TIA
>>>
>>> Brett
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx