Re: Weird full heal on Distributed-Disperse volume with sharding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dmitry,

my comments below...

On Tue, Sep 29, 2020 at 11:19 AM Dmitry Antipov <dmantipov@xxxxxxxxx> wrote:
For the testing purposes, I've set up a localhost-only setup with 6x16M
ramdisks (formatted as ext4) mounted (with '-o user_xattr') at
/tmp/ram/{0,1,2,3,4,5} and SHARD_MIN_BLOCK_SIZE lowered to 4K. Finally
the volume is:

Volume Name: test
Type: Distributed-Replicate
Volume ID: 241d6679-7cd7-48b4-bdc5-8bc1c9940ac3
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: [local-ip]:/tmp/ram/0
Brick2: [local-ip]:/tmp/ram/1
Brick3: [local-ip]:/tmp/ram/2
Brick4: [local-ip]:/tmp/ram/3
Brick5: [local-ip]:/tmp/ram/4
Brick6: [local-ip]:/tmp/ram/5
Options Reconfigured:
features.shard-block-size: 64KB
features.shard: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Then I mount it under /mnt/test:

# mount -t glusterfs [local-ip]:/test /mnt/test

and create 4M file on it:

# dd if=/dev/random of=/mnt/test/file0 bs=1M count=4

This creates 189 shards of 64K each, in /tmp/ram/?/.shard:

/tmp/ram/0/.shard: 24
/tmp/ram/1/.shard: 24
/tmp/ram/2/.shard: 24
/tmp/ram/3/.shard: 39
/tmp/ram/4/.shard: 39
/tmp/ram/5/.shard: 39

To simulate data loss I just remove 2 arbitrary .shard directories,
for example:

# rm -rfv /tmp/ram/0/.shard /tmp/ram/5/.shard

Finally, I do full heal:

# gluster volume heal test full

and successfully got all shards under /tmp/ram/{0,5}.shard back.

But the things seems going weird for the following volume:

Volume Name: test
Type: Distributed-Disperse
Volume ID: aa621c7e-1693-427a-9fd5-d7b38c27035e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: [local-ip]:/tmp/ram/0
Brick2: [local-ip]:/tmp/ram/1
Brick3: [local-ip]:/tmp/ram/2
Brick4: [local-ip]:/tmp/ram/3
Brick5: [local-ip]:/tmp/ram/4
Brick6: [local-ip]:/tmp/ram/5
Options Reconfigured:
features.shard: on
features.shard-block-size: 64KB
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

After creating 4M file as before, I've got the same 189 shards
but 32K each.

This is normal. A dispersed volume writes encoded fragments of each block in each brick. In this case it's a 2+1 configuration, so each block is divided into 2 fragments. A third fragment is generated for redundancy and stored on the third brick.
 
After deleting /tmp/ram/{0,5}/.shard and full heal,
I was able to get all shards back. But, after deleting
/tmp/ram/{3,4}/.shard and full heal, I've ended up with the following:

This is not right. A disperse 2+1 configuration only supports a single failure. Wiping 2 fragments from the same file makes the file unrecoverable. Disperse works using the Reed-Solomon erasure code, which requires at least 2 healthy fragments to recover the data (in a 2+1 configuration).

If you want to be able to recover from 2 disk failures, you need to create a 4+2 configuration.

To make it more clear: a 2+1 configuration is like a traditional RAID5 with 3 disks. If you lose 2 disks, data is lost. A 4+2 is similar to a RAID6.

Regards,

Xavi


/tmp/ram/0/.shard:
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.10
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.11
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.12
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.13
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.14
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.15
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.16
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.17
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.2
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.22
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.23
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.27
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.28
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.3
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.31
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.34
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.35
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.37
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.39
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.4
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.40
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.44
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.45
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.46
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.47
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.53
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.54
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.55
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.57
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.58
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.6
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.63
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.7
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.9

/tmp/ram/1/.shard:
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.10
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.11
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.12
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.13
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.14
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.15
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.16
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.17
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.2
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.22
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.23
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.27
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.28
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.3
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.31
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.34
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.35
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.37
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.39
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.4
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.40
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.44
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.45
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.46
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.47
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.53
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.54
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.55
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.57
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.58
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.6
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.63
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.7
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.9

/tmp/ram/2/.shard:
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.10
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.11
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.12
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.13
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.14
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.15
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.16
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.17
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.2
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.22
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.23
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.27
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.28
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.3
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.31
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.34
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.35
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.37
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.39
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.4
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.40
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.44
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.45
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.46
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.47
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.53
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.54
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.55
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.57
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.58
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.6
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.63
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.7
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.9

So, /tmp/ram/{3,4}/.shard was not recovered. Even worse, /tmp/ram/5/.shard
has disappeared completely. And of course this breaks all I/O on /mnt/test/file0,
for example:

# dd if=/dev/random of=/mnt/test/file0 bs=1M count=4
dd: error writing '/mnt/test/file0': No such file or directory
dd: closing output file '/mnt/test/file0': No such file or directory

Any ideas on what's going on here? 

Dmitry
_______________________________________________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968




Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux