Wrong assumptions about disperse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I've seen in many places the belief that disperse, or erasure coding in general, is slow because of the complex or costly math involved. It's true that there's an overhead compared to a simple copy like replica does, but this overhead is way more smaller than many people think.

The math used by disperse, if tested alone outside gluster, is much faster than it seems. AFAIK the real problem of EC is the communications layer. It adds a lot of latency and having to communicate simultaneously and coordinate 6 or more bricks has a big impact.

Erasure coding also suffers from partial writes, that require a read-modify-write cycle. However this is completely avoided in many situations where the volume is optimally configured and writes are in blocks of multiples of 4096 bytes and aligned (typical on VMs, databases and many other workloads). It could even be avoided in other situations taking advantage of the write-behind xlator (not done yet).

I've used a single core of two machines to test the raw math: one quite limited (Atom D525 1.8 GHz) and another more powerful but not a top CPU (Xeon E5-2630L 2.0 GHz).

Common parameters:

* nonsystematic vandermonde matrix (the same used by ec)
* algorithm slightly slower than the one used bye ec (I haven't implemented some optimizations in the test program, but I think the difference should be very small)
* buffer size: 128 KiB
* number of iterations: 16384
* total size processed: 2 GiB
* results in MiB/s for a single core

Config   Atom   Xeon
  2+1     633   1856
  4+1     405   1203
  4+2     324    984
  4+3     275    807
  8+2     227    611
  8+3     202    545
  8+4     182    501
 16+3     116    303
 16+4     111    295

The same tests using Intel SSE2 extensions (not present in EC yet, but the patch is in review):

Config   Atom   Xeon
  2+1     821   3047
  4+1     767   2246
  4+2     629   1887
  4+3     535   1632
  8+2     466   1237
  8+3     423   1104
  8+4     388   1044
 16+3     289    675
 16+4     271    637

With AVX2 it should be faster, but my machines doesn't support it.

This is even much much faster when a systematic matrix is used. For example a 16+4 configuration using SSE on a Xeon core can encode at 3865 MiB/s. However this won't be a big difference inside gluster.

Currently EC encoding/decoding for small/medium configurations is not the bottle-neck of disperse. Maybe for big configurations on slow machines, it could have some impact (I don't have resources to test those big configurations properly).

Regards,

Xavi
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux