Re: Hello, I have a question about the erasure code translator, hope someone give me some advice, thank you!

Xavi Hernandez <jahernan@xxxxxxxxxx> · Mon, 8 Apr 2019 10:02:00 +0200

Hi,

On Mon, Apr 8, 2019 at 8:50 AM PSC <1173701037@xxxxxx> wrote:
Hi,
I am a storage software coder who is interested in Gluster. I am
trying to improve the read/write performance of it. 

I
noticed that gluster is using Vandermonde matrix in erasure code
encoding and decoding process. However, it is quite complicate to
generate inverse matrix of a Vandermonde matrix, which is necessary
for decode. The cost is O(n³).

That's not true, actually. A Vandermonde matrix can be inverted in O(n^2), as the code currently does (look at ec_method_matrix_inverse() in ec-method.c). Additionally, current code does caching of inverted matrices, so in normal circumstances there shouldn't be many inverse computations. Only when something changes (a brick dies or comes online), a new inverted matrix could be needed. 

Use
a Cauchy matrix, can greatly cut down the cost of the process to find
an inverse matrix. Which is O(n²).

I
use intel storage accelerate library to replace the original ec
encode/decode part of gluster. And it reduce the encode and decode
time to about 50% of the
original one.

How do you test that ? I also did some tests long ago and I didn't observe that difference.

Doing a raw test of encoding/decoding performance of the current code using Intel AVX2 extensions, it's able to process 7.6 GiB/s on a single core of an Intel Xeon Silver 4114 when L1 cache is used. Without relying on internal cache, it performs at 3.9 GiB/s. Does ISA-L provide better performance for a matrix of the same size (4+2 non-systematic matrix) ?

However,
when I test the whole system. The read/write performance is almost
the same as the original gluster.

Yes, there are many more things involved in the read and write operations in gluster. For the particular case of EC, having to deal with many bricks simultaneously (6 in this case) means that it's very sensitive to network latency and communications delays, and this is probably one of the biggest contributors. There some other small latencies added by other xlators.

I
test it on three machines as servers. Each one had two bricks, both
of them are SSD. So the total  amount of bricks is 6. Use two of them
as coding bricks. That is a 4+2 disperse volume configure.

The
capability of network card is 10000Mbps. Theoretically it can support
read and write with the speed faster than 1000MB/s.

The
actually performance of read is about 492MB/s.
The
actually performance of write is about 336MB/s.

While
the original one read at 461MB/s,
write at 322MB/s

Is
there someone who can give me some advice about how to improve its
performance? Which part is the critical defect on its performance if
it’s not the ec translator? 

I
did a time count on translators. It show me EC translator just take
7% in the whole read\write process. Even though I knew that some
translators are run asynchronous,
so the real percentage can be some how lager than that. 

Sincerely
thank you for your patient to read my question!

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel