Re: Combining erasure coding and replication?

Frank Schilder <frans@xxxxxx> · Fri, 27 Mar 2020 07:55:41 +0000

Dear Eugen,

I guess what you are suggesting is something like k+m with m>=k+2, for example k=4, m=6. Then, one can distribute 5 shards per DC and sustain the loss of an entire DC while still having full access to redundant storage.

Now, a long time ago I was in a lecture about error-correcting codes (Reed-Solomon codes). From what I remember, the computational complexity of these codes explodes at least exponentially with m. Out of curiosity, how does m>3 perform in practice? What's the CPU requirement per OSD?

Best regards,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 27 March 2020 08:33:45
To: ceph-users@xxxxxxx
Subject:  Re: Combining erasure coding and replication?

Hi Brett,

> Our concern with Ceph is the cost of having three replicas. Storage
> may be cheap but I’d rather not buy ANOTHER 5pb for a third replica
> if there are ways to do this more efficiently. Site-level redundancy
> is important to us so we can’t simply create an erasure-coded volume
> across two buildings – if we lose power to a building, the entire
> array would become unavailable.

can you elaborate on that? Why is EC not an option? We have installed
several clusters with two datacenters resilient to losing a whole dc
(and additional disks if required). So it's basically the choice of
the right EC profile. Or did I misunderstand something?

Zitat von Brett Randall <brett.randall@xxxxxxxxx>:

> Hi all
>
> Had a fun time trying to join this list, hopefully you don’t get
> this message 3 times!
>
> On to Ceph… We are looking at setting up our first ever Ceph cluster
> to replace Gluster as our media asset storage and production system.
> The Ceph cluster will have 5pb of usable storage. Whether we use it
> as object-storage, or put CephFS in front of it, is still TBD.
>
> Obviously we’re keen to protect this data well. Our current Gluster
> setup utilises RAID-6 on each of the nodes and then we have a single
> replica of each brick. The Gluster bricks are split between
> buildings so that the replica is guaranteed to be in another
> premises. By doing it this way, we guarantee that we can have a
> decent number of disk or node failures (even an entire building)
> before we lose both connectivity and data.
>
> Our concern with Ceph is the cost of having three replicas. Storage
> may be cheap but I’d rather not buy ANOTHER 5pb for a third replica
> if there are ways to do this more efficiently. Site-level redundancy
> is important to us so we can’t simply create an erasure-coded volume
> across two buildings – if we lose power to a building, the entire
> array would become unavailable. Likewise, we can’t simply have a
> single replica – our fault tolerance would drop way down on what it
> is right now.
>
> Is there a way to use both erasure coding AND replication at the
> same time in Ceph to mimic the architecture we currently have in
> Gluster? I know we COULD just create RAID6 volumes on each node and
> use the entire volume as a single OSD, but that this is not the
> recommended way to use Ceph. So is there some other way?
>
> Apologies if this is a nonsensical question, I’m still trying to
> wrap my head around Ceph, CRUSH maps, placement rules, volume types,
> etc etc!
>
> TIA
>
> Brett
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx