Re: Weird full heal on Distributed-Disperse volume with sharding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/30/20 8:58 AM, Xavi Hernandez wrote:

This is normal. A dispersed volume writes encoded fragments of each block in each brick. In this case it's a 2+1 configuration, so each block is divided into 2 fragments. A third fragment is generated for redundancy and stored on the third brick.

OK. But for Distributed-Replicate 2 x 3 setup and 64K shards, 4M file should be split into (4096 / 64) * 3 = 192 shards, not 189. So why 189?

And if all bricks are considered equal and has enough amount of free space, shards distribution {24, 24, 24, 39, 39, 39} looks suboptimal.
Why not {31, 32, 31, 32, 31, 32}? Isn't it a bug?

This is not right. A disperse 2+1 configuration only supports a single failure. Wiping 2 fragments from the same file makes the file unrecoverable. Disperse works using the Reed-Solomon erasure code, which requires at least 2 healthy fragments to recover the data (in a 2+1 configuration).

It seems that I missed the point that all bricks are considered equal, regardless of the physical host they're attached to.

So, for the Distributed-Disperse 2 x (2 + 1) setup with 3 hosts, 2 bricks per each, and two files, A and B, it's possible to have
the following layout:

Host0:                  Host1:                  Host2:
|- Brick0: A0 B0        |- Brick0: A1           |- Brick0: A2
|- Brick1: B1           |- Brick1: B2           |- Brick1:

This setup can tolerate single brick failure but not single host failure because if Host0 is down, two fragments of B will be lost
and so B becomes unrecoverable (but A is not).

If this is so, is it possible/hard to enforce 'one fragment per *host*' behavior? If we can guarantee the following:

Host0:                  Host1:                  Host2:
|- Brick0: A0           |- Brick0: A1           |- Brick0: A2
|- Brick1: B1           |- Brick1: B2           |- Brick1: B0

this setup can tolerate both single brick and single host failures.

Dmitry
_______________________________________________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux