Hi Loic, i think the best is to read along the sources. It is very readable! https://github.com/madiator/HadoopUSC/blob/developUSC/src/contrib/raid/src/java/org/apache/hadoop/raid/SimpleRegeneratingCode.java If there is a high interest in this, you could port the code from Java to C++ and use the GF-complete or Intel GF-multiplication implementation for the used GF multiplications. Cheers Andreas. ________________________________________ From: ceph-devel-owner@xxxxxxxxxxxxxxx [ceph-devel-owner@xxxxxxxxxxxxxxx] on behalf of Loic Dachary [loic@xxxxxxxxxxx] Sent: 30 June 2014 11:06 To: Koleos Fuscus Cc: Ceph Development Subject: Re: erasure code and coefficients Hi koleosfuscus, It clarifies it enough to raise a question : where can I read code (or an algorithm if not code) that chose the coefficients desirable to implement what is suggested in the Xorbas paper ? Cheers On 30/06/2014 10:18, Koleos Fuscus wrote: > Hi Loic, > > I am happy to contribute with some clarifications. In fact, > erasure/reliability concepts are not blocking my progress with the > reliability model at ceph. It is the ceph model itself that has some > parts not clear to me, and nobody had time yet to review the state > model diagram that I published on the wiki. :( > Anyway, regarding coefficients here is a bit of background. > Coefficients are the numbers that multiply your variables inside an > equation. In a toy example, to solve the equation ax^2+bx+c=0 you need > to find the coefficients a,b,c that make the equation valid. > In the context of Reed Solomon, the definition of coefficients is a > bit more confusing. In the original design, the message x is > interpreted as coefficients of a polynomial p. But in subsequents > interpretations the message x is seen as the values of the polynomial > p evaluated at the first k points a1..ak. Such interpretation is > apparently a bit less efficient but desirable because you can > construct a systematic code. > In the context of xorbas, you are constructing a code on top of Reed > Solomon. The codewords are seen as values, and the idea is to get > coefficients c1..c10 that also satisfy s1+s2+s3=0 (take this as a > missing introduction to my previous message) > > Cheers, > > koleosfuscus > > ________________________________________________________________ > "My reply is: the software has no known bugs, therefore it has not > been updated." > Wietse Venema > > > On Sun, Jun 29, 2014 at 8:44 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >> Hi koleofuscus, >> >> Thanks for the explanation : it is very conforting to know that you understand this :-) At the risk of being thick, I must say that the very notion of "coefficient" eludes me. What are they ? >> >> Cheers >> >> On 29/06/2014 20:38, Koleos Fuscus wrote: >>> Hello Loic, >>> Dimakis (one of the authors of xorbas) is talking about coefficients >>> because they want to find a way to reduce the storage overhead used >>> with LRC. In the simple case used in Fig. 2, a RS (k=10, m=4) has >>> 14/10 storage overhead but when using LRC, the overhead increases to >>> 17/10 because you also need to store s1, s2 and s3. Basically, the >>> idea is to find specific coefficients c1..c10 that permit to obtain s3 >>> through s1 and s2. In other words, get some s1 and s2 that when xored >>> together give s3. If you find such coefficients, you don't need to >>> store s3 and the storage overhead of LRC is 1.6x instead of 1.7x. >>> >>> Dimakis said that for the Reed Solomon implementation used in HDFS >>> RAID they can simple set all coefficients with value '1' and use xor. >>> >>> This cannot be the case of the Reed Solomon implemented by you (I >>> understood is the jerasure library by Plank) but that I am not sure. I >>> guess we need the help of a mathematician or at least check and >>> compare both implementations. >>> >>> Finally, apparently for xorbas they only implemented the configuration >>> RS(10,4) and not other combinations. Unfortunately, the wiki page of >>> the project is empty http://wiki.apache.org/hadoop/ErasureCode and the >>> main page says 'erasure coding under development'. >>> >>> I recommend you to watch the xorbas presentation video >>> http://smahesh.com/HadoopUSC/ (a very clear explanation of xorbas) and >>> use the Dimakis wiki page to check the large collection of paper they >>> have: http://storagewiki.ece.utexas.edu/ >>> >>> Best, >>> >>> koleosfuscus >>> >>> ________________________________________________________________ >>> "My reply is: the software has no known bugs, therefore it has not >>> been updated." >>> Wietse Venema >>> >>> >>> On Sun, Jun 29, 2014 at 11:30 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote: >>>> Hi Andreas, >>>> >>>> In http://anrg.usc.edu/~maheswaran/Xorbas.pdf I get the idea of computing local coding chunks the way it is implemented in https://github.com/ceph/ceph/pull/1921 (i.e. delegating encoding / decoding to other plugins). However, there are theoretical aspects of the paper that I do not understand and I'm hoping you can shed some light on it. In particular, I don't know what "coefficients" are about. For instance in the context of Figure 2 caption : "The main theoretical challenge is to choose the coeffi cients c(i) to maximize the fault tolerance of the code." >>>> >>>> Would you recommend a paper to read to better understand this ? Also I'd like to understand what "coefficients" mean in the context of jerasure or if they do not apply. >>>> >>>> Thanks for you help :-) >>>> >>>> -- >>>> Loïc Dachary, Artisan Logiciel Libre >>>> >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html