Re: CEPH Erasure Encoding + OSD Scalability

Loic Dachary <loic@xxxxxxxxxxx> · Fri, 20 Sep 2013 14:33:59 +0200

Hi Andreas,

Great work on these benchmarks ! It's definitely an incentive to improve as much as possible. Could you push / send the scripts and sequence of operations you've used ? I'll reproduce this locally while getting rid of the extra copy. It would be useful to capture that into a script that can be conveniently run from the teuthology integrations tests to check against performance regressions.

Regarding the 3P implementation, in my opinion it would be very valuable for some people who prefer low CPU consumption. And I'm eager to see more than one plugin in the erasure code plugin directory ;-)

Cheers

On 20/09/2013 13:35, Andreas Joachim Peters wrote:
> Hi Loic, 
> 
> I have now some benchmarks on a Xeon 2.27 GHz 4-core with gcc 4.4 (-O2) for ENCODING based on the CEPH Jerasure port.
> I measured for objects from 128k to 512 MB with random contents (if you encode 1 GB objects you see slow downs due to caching inefficiencies ...), otherwise results are stable for the given object sizes.
> 
> I quote only the benchmark for ErasureCodeJerasureReedSolomonRAID6 (3,2) , the other are significantly slower (2-3x slower) and my 3P(3,2,1) implementation providing the same redundancy level like RS-Raid6[3,2] (double disk failure) but using more space (66% vs 100% overhead).
> 
> The effect of out.c_str() is significant ( contributes with factor 2 slow-down for the best jerasure algorithm for [3,2] ).
> 
> Averaged results for Objects Size 4MB:
> 
> 1) Erasure CRS [3,2] - 2.6 ms buffer preparation (out.c_str()) - 2.4 ms encoding => ~780 MB/s
> 2) 3P [3,2,1] - 0,005 ms buffer preparation (3P adjusts the padding in the algorithm) - 0.87ms encoding => ~4.4 GB/s
> 
> I think it pays off to avoid the copy in the encoding if it does not matter for the buffer handling upstream and pad only the last chunk.
> 
> Last thing I tested is how performances scales with number of cores running 4 tests in parallel:
> 
> Jerasure (3,2) limits at ~2,0 GB/s for a 4-core CPU (Xeon 2.27 GHz).
> 3P(3,2,1) limits ~8 GB/s for a 4-core CPU (Xeon 2.27 GHz).
> 
> I also implemented the decoding for 3P, but didn't test yet all reconstruction cases. There is probably room for improvements using AVX support for XOR operations in both implementations.
> 
> Before I invest more time, do think it is useful to have this fast 3P algorithm for double disk failures with 100% space overhead? Because I believe that people will always optimize for space and would rather use something like (10,2) even if the performance degrades and CPU consumption goes up?!? Let me know, no problem in any case!
> 
> Finally I tested some combinations for ErasureCodeJerasureReedSolomonRAID6:
> 
> (3,2) (4,2) (6,2) (8,2) (10,2) they all run around 780-800 MB/s
> 
> Cheers Andreas.
> 
> 
> 
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.

Attachment:
signature.asc

Description: OpenPGP digital signature