RE: Comments on Ceph distributed parity implementation

Paul Von-Stamwitz <PVonStamwitz@xxxxxxxxxxxxxx> · Tue, 18 Jun 2013 18:35:45 -0700

Benoit,

I agree with you. Redundancy and availability, as well as performance for both encoding and reconstruction, are related. There are inherent trade-offs between them. The beautiful thing about Ceph is the support for multiple pools which can have different attributes. Today we have pools with differing performance and redundancy characteristics. This may add complexity when adding erasure coding. Ideally, a user could create a pool that favored capacity and another one that favored availability and another... etc.

Jim, I like the image of your "space-overhead/fault-tolerance curve" because I see our issue as multiple points on the curve rather than a single one. From a performance perspective, I agree that the CPU burden is not really an issue, but inter-node communication is.

All the best
Paul

On June 18, 2013 at 7:23 AM, James Plank wrote:
> 
> Hi all -- thank you for including me on this thread, although I have
> little substantive to add.  At the moment, my sole focus is finishing a
> journal paper about GF implementations, with a concomitant GF-complete
> release to accompany it.  I agree that the CPU burden of the GF arithmetic
> will not be a bottleneck in your system, regardless of which
> implementation you use, as long as you stay at or below GF(2^16).  If you
> want to go higher, GF-complete will help.  When we put out a new release
> (the code will be ready within two weeks, however, the documentation is
> lagging), I'll let you know.  I think LRC is a nice coding paradigm,
> although I imagine that it has IP issues with Microsoft.  I don't have
> first-hand experience with network/regenerating codes, and I'll be honest
> -- there have been so many papers in that realm that I am not up to date
> on them.
> 
> Is there a question on which you'd like some help?  It sounds as though
> you are at two decision points: What code should you use, and at which
> point on the space-overhead/fault-tolerance curve would you like to be?
> 
> Best wishes,
> 
> Jim
> ----------
> 
> On Jun 18, 2013, at 3:44 AM, Benoît Parrein wrote:
> 
> > Hi Paul,
> >
> > thank you for your message
> >
> > from my point, LRC focuses on the repairing problem. how to reconstruct
> destroyed node to maintain the same availability by the distributed
> system?
> > in this context they can even go below 1x rate by introducing local
> parity on classical Reed Solomon blocks (but they pay a supplementary
> overhead). see excellent Alex Dimakis's papers for that. but, still from
> my point, the same relationship between redundancy and availability occurs
> (if you consider binomial model for your loses).
> >
> > best
> > bp
> >
> >
> > Le 17/06/2013 18:55, Paul Von-Stamwitz a écrit :
> >> Loic,
> >>
> >> As Benoit points out, Mojette uses discrete geometry rather than
> algebra, so simple XOR is all that is needed.
> >>
> >> Benoit,
> >>
> >> Microsoft's paper states that their [12,2,2] LRC provides better
> availability than 3x replication with 1.33x efficiency. 1.5x is certainly
> a good number. I'm just pointing out that better efficiency can be had
> without losing availibity.
> >>
> >> All the best,
> >> Paul
> >
> > <benoit_parrein.vcf>

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html