Re: Remove an artificial limitation of disperse volume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Pavan,

On 07/02/17 14:51, Nag Pavan Chilakam wrote:
You can always go for x3(3 replica copies), to address your need which you have asked
EC volumes can be seen as raid for understanding purpose, but don't see it as an apple-to-apple comparison.
Raid4/6(mostly) relies on XOR'ing bits(so basic addition and subtraction), but EC involves a more complex algorithm(reed-solomon)

In fact RAID-5 and RAID-6 can be seen as an implementation of Reed-Solomon, though for really easy cases, so they are directly computed using xors and no one talks about Reed-Solomon.

For example, one possible Reed-Solomon matrix that implements a RAID-5 is equivalent to compute the redundancy as the XOR of all data blocks. This is precisely what RAID-5 uses. RAID-6 is also very similar.

The current implementation of the Reed-Solomon in EC only uses XORs to compute the parity and recover the data. The average number of XORs per input byte needed to compute the redundancy depends on the CPU extensions used (none, SSE, AVX) and the configuration. This is a table showing this:

          x86_64   SSE    AVX
     2+1   0.79    0.39   0.20
     4+2   1.76    0.88   0.44
     4+3   2.06    1.03   0.51
     8+3   3.40    1.70   0.85
     8+4   3.71    1.86   0.93
    16+4   6.34    3.17   1.59

Note that for AVX and a 16+4 configuration it only uses 1.59 xors on average to compute the 4 redundancies. It only needs more than one xor per byte of redundancy for x86_64 and 16+4 (6.34 / 4 = 1.585).

There's a technical document explaining how EC works internally here, though it's oriented to developers and people who already know the basics about erasure codes:

https://review.gluster.org/#/c/15637/4/doc/developer-guide/ec-implementation.md

Xavi



----- Original Message -----
From: "Olivier Lambert" <lambert.olivier@xxxxxxxxx>
To: "gluster-users" <gluster-users@xxxxxxxxxxx>
Sent: Tuesday, 7 February, 2017 6:46:37 PM
Subject:  Remove an artificial limitation of disperse volume

Hi everyone!

I'm currently working on implementing Gluster on XenServer/Xen Orchestra.

I want to expose some Gluster features (in the easiest possible way to
the user).

Therefore, I want to expose only "distributed/replicated" and
"disperse" mode. From what I understand, they are working differently.
Let's take a simple example.

Setup: 6x nodes with 1x 200GB disk each.

* Disperse with redundancy 2 (4+2): I can lose **any 2 of all my
disks**. Total usable space is 800GB. It's a kind of RAID6 (or RAIDZ2)
* Distributed/replicated with replica 2: I can lose 2 disks **BUT**
not on the same "mirror". Total usable space is 600GB. It's a kind of
RAID10

So far, is it correct?

My main point is that behavior is very different (pairing disks in
distributed/replicated and "shared" parity in disperse).

Now, let's imagine something else. 4x nodes with 1x 200GB disk each.

Why not having disperse with redundancy 2? It will be the same in
terms of storage space than distributed/replicated, **BUT** in
disperse I can lose any of 2 disks. In dist/rep, only if they are not
on the same "mirror".

So far, I can't create a disperse volume if the redundancy level is
50% or more the number of bricks. I know that perfs would be better in
dist/rep, but what if I prefer anyway to have disperse?

Conclusion: would it be possible to have a "force" flag during
disperse volume creation even if redundancy is higher that 50%?



Thanks!



Olivier.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux