Re: Combining erasure coding and replication?

Eugen Block <eblock@xxxxxx> · Fri, 27 Mar 2020 08:56:31 +0000

Hi,

I guess what you are suggesting is something like k+m with m>=k+2,  
for example k=4, m=6. Then, one can distribute 5 shards per DC and  
sustain the loss of an entire DC while still having full access to  
redundant storage.

that's exactly what I mean, yes.

Now, a long time ago I was in a lecture about error-correcting codes  
(Reed-Solomon codes). From what I remember, the computational  
complexity of these codes explodes at least exponentially with m.  
Out of curiosity, how does m>3 perform in practice? What's the CPU  
requirement per OSD?

Such a setup usually would be considered for archiving purposes so the  
performance requirements aren't very high, but so far we haven't heard  
any complaints performance-wise.
I don't have details on CPU requirements at hand right now.

Regards,
Eugen

Zitat von Frank Schilder <frans@xxxxxx>:

Dear Eugen,

I guess what you are suggesting is something like k+m with m>=k+2,  
for example k=4, m=6. Then, one can distribute 5 shards per DC and  
sustain the loss of an entire DC while still having full access to  
redundant storage.

Now, a long time ago I was in a lecture about error-correcting codes  
(Reed-Solomon codes). From what I remember, the computational  
complexity of these codes explodes at least exponentially with m.  
Out of curiosity, how does m>3 perform in practice? What's the CPU  
requirement per OSD?

Best regards,

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Eugen Block <eblock@xxxxxx>
Sent: 27 March 2020 08:33:45
To: ceph-users@xxxxxxx
Subject:  Re: Combining erasure coding and replication?

Hi Brett,

Our concern with Ceph is the cost of having three replicas. Storage
may be cheap but I’d rather not buy ANOTHER 5pb for a third replica
if there are ways to do this more efficiently. Site-level redundancy
is important to us so we can’t simply create an erasure-coded volume
across two buildings – if we lose power to a building, the entire
array would become unavailable.

can you elaborate on that? Why is EC not an option? We have installed
several clusters with two datacenters resilient to losing a whole dc
(and additional disks if required). So it's basically the choice of
the right EC profile. Or did I misunderstand something?

Zitat von Brett Randall <brett.randall@xxxxxxxxx>:

Hi all

Had a fun time trying to join this list, hopefully you don’t get
this message 3 times!

On to Ceph… We are looking at setting up our first ever Ceph cluster
to replace Gluster as our media asset storage and production system.
The Ceph cluster will have 5pb of usable storage. Whether we use it
as object-storage, or put CephFS in front of it, is still TBD.

Obviously we’re keen to protect this data well. Our current Gluster
setup utilises RAID-6 on each of the nodes and then we have a single
replica of each brick. The Gluster bricks are split between
buildings so that the replica is guaranteed to be in another
premises. By doing it this way, we guarantee that we can have a
decent number of disk or node failures (even an entire building)
before we lose both connectivity and data.

Our concern with Ceph is the cost of having three replicas. Storage
may be cheap but I’d rather not buy ANOTHER 5pb for a third replica
if there are ways to do this more efficiently. Site-level redundancy
is important to us so we can’t simply create an erasure-coded volume
across two buildings – if we lose power to a building, the entire
array would become unavailable. Likewise, we can’t simply have a
single replica – our fault tolerance would drop way down on what it
is right now.

Is there a way to use both erasure coding AND replication at the
same time in Ceph to mimic the architecture we currently have in
Gluster? I know we COULD just create RAID6 volumes on each node and
use the entire volume as a single OSD, but that this is not the
recommended way to use Ceph. So is there some other way?

Apologies if this is a nonsensical question, I’m still trying to
wrap my head around Ceph, CRUSH maps, placement rules, volume types,
etc etc!

TIA

Brett

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx