Re: Erasure code library summary

Alex Elsayed <eternaleye@xxxxxxxxx> · Wed, 19 Jun 2013 00:47:56 -0700

Loic Dachary wrote:

> 
> 
> On 06/19/2013 03:14 AM, Alex Elsayed wrote:
>> Alex Elsayed wrote:
>> 
>>> Loic Dachary wrote:
>>>
>>>> Hi Ceph,
>>>>
>>> <snip>
>>>> Reed-Solomon coding family is the only one that can keep the chuncks
>>>> unencoded and therefore concatenable.
>>> <snip>
>>>
>>> In my understanding, this is not strictly true - any 'systematic' code
>>> will have the unencoded chunks remain available in this manner, and any
>>> non- systematic linear code can be transformed into a systematic code
>>> with the same minimum distance. Fountain codes are often explicitly
>>> constructed to maintain this property, as in the case of RaptorQ [RFC
>>> 6330].
>>>
>>> https://en.wikipedia.org/wiki/Systematic_code
>> 
>> ...that said, Reed-Solomon is to the best of my knowledge the only space-
>> optimal such code.
> 
> What does "space-optimal" mean ? Does it mean that Reed-Solomon will use
> less disk space than fountain codes to code the same number of parity
> chunks ?

Optimal (for an erasure code) means that if you have K symbols of real data, 
then *any* K symbols of the output of the erasure code will let you recover 
it.

Current fountain codes (RaptorQ is best-of-breed right now as far as I know) 
require K + epsilon, and while epsilon is zero for the vast majority of 
cases, some K-sized subsets of the total list of encoded symbols have a non-
zero epsilon, thus requiring more parity data to get exactly the same level 
of assurance.

Optimal erasure codes are also known as "Maximum Distance Separable" codes.

>> An interesting option, however, might be to use a
>> fountain code over the network when distributing either replicas *or*
>> parity chunks, so that losses can be recovered with <1 full chunk
>> retransmission.
> 
> I would be gratefull if you could expand on this idea. I don't get it :-)

First, a couple caveats - one, doing this over TCP would yield no real 
benefit. In fact, any reliable transport makes this mostly pointless - the 
idea is to avoid retransmitting not only chunks, but packets as well.

Let's assume 4MB chunks. Encode the chunk as a single source block (Raptor 
terminology, see the RFC), with a symbol size chosen to fit 1 (one) symbol 
comfortably into a single packet of whatever unreliable, unordered transport 
you're using. DCCP is basically perfect for this.

Send the symbols taking advantage of RaptorQ being a systematic code, and 
thus sending the unmodified chunk first. If it gets through okay, the 
receiver closes the connection and you're done.

If one or more packets failed to get through, those are erasures - so the 
receiver leaves the connection open. The sender can be really simplistic - 
'keep encoding and sending symbols as long as the connection is open.' Once 
the receiver has enough symbols to recover, it closes the connection.

In cases of no loss, overhead is zero. In cases of some loss, the number of 
additional packets is equal to the number of lost packets plus a (very 
small) potential overhead. The real benefit here is this:

There is no longer any need to wait a syn/ack cycle to realize a packet was 
lost.

This is the use case fountain codes are optimized for - coding for 
transmission. Creating a new symbol is an O(1) operation for RaptorQ, while 
for Reed-Solomon it's O(N) with the size of the source block.

Another neat property with Raptor codes is that you can have multiple, 
unsynchronized senders - so for replicas, once one replica has succeeded it 
could join in to accelerate it *linearly* without needing to track who had 
which symbols in the chunk.

Multicast, too.

> Cheers
> 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html