Hi Dmitry,
On Tue, Sep 29, 2020 at 11:19 AM Dmitry Antipov <dmantipov@xxxxxxxxx> wrote:
For the testing purposes, I've set up a localhost-only setup with 6x16M
ramdisks (formatted as ext4) mounted (with '-o user_xattr') at
/tmp/ram/{0,1,2,3,4,5} and SHARD_MIN_BLOCK_SIZE lowered to 4K. Finally
the volume is:
Volume Name: test
Type: Distributed-Replicate
Volume ID: 241d6679-7cd7-48b4-bdc5-8bc1c9940ac3
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: [local-ip]:/tmp/ram/0
Brick2: [local-ip]:/tmp/ram/1
Brick3: [local-ip]:/tmp/ram/2
Brick4: [local-ip]:/tmp/ram/3
Brick5: [local-ip]:/tmp/ram/4
Brick6: [local-ip]:/tmp/ram/5
Options Reconfigured:
features.shard-block-size: 64KB
features.shard: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
Then I mount it under /mnt/test:
# mount -t glusterfs [local-ip]:/test /mnt/test
and create 4M file on it:
# dd if=/dev/random of=/mnt/test/file0 bs=1M count=4
This creates 189 shards of 64K each, in /tmp/ram/?/.shard:
/tmp/ram/0/.shard: 24
/tmp/ram/1/.shard: 24
/tmp/ram/2/.shard: 24
/tmp/ram/3/.shard: 39
/tmp/ram/4/.shard: 39
/tmp/ram/5/.shard: 39
To simulate data loss I just remove 2 arbitrary .shard directories,
for example:
# rm -rfv /tmp/ram/0/.shard /tmp/ram/5/.shard
Finally, I do full heal:
# gluster volume heal test full
and successfully got all shards under /tmp/ram/{0,5}.shard back.
But the things seems going weird for the following volume:
Volume Name: test
Type: Distributed-Disperse
Volume ID: aa621c7e-1693-427a-9fd5-d7b38c27035e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: [local-ip]:/tmp/ram/0
Brick2: [local-ip]:/tmp/ram/1
Brick3: [local-ip]:/tmp/ram/2
Brick4: [local-ip]:/tmp/ram/3
Brick5: [local-ip]:/tmp/ram/4
Brick6: [local-ip]:/tmp/ram/5
Options Reconfigured:
features.shard: on
features.shard-block-size: 64KB
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
After creating 4M file as before, I've got the same 189 shards
but 32K each.
This is normal. A dispersed volume writes encoded fragments of each block in each brick. In this case it's a 2+1 configuration, so each block is divided into 2 fragments. A third fragment is generated for redundancy and stored on the third brick.
After deleting /tmp/ram/{0,5}/.shard and full heal,
I was able to get all shards back. But, after deleting
/tmp/ram/{3,4}/.shard and full heal, I've ended up with the following:
This is not right. A disperse 2+1 configuration only supports a single failure. Wiping 2 fragments from the same file makes the file unrecoverable. Disperse works using the Reed-Solomon erasure code, which requires at least 2 healthy fragments to recover the data (in a 2+1 configuration).
If you want to be able to recover from 2 disk failures, you need to create a 4+2 configuration.
To make it more clear: a 2+1 configuration is like a traditional RAID5 with 3 disks. If you lose 2 disks, data is lost. A 4+2 is similar to a RAID6.
Regards,
Xavi
/tmp/ram/0/.shard:
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.10
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.11
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.12
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.13
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.14
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.15
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.16
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.17
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.2
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.22
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.23
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.27
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.28
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.3
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.31
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.34
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.35
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.37
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.39
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.4
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.40
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.44
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.45
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.46
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.47
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.53
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.54
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.55
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.57
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.58
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.6
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.63
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.7
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.9
/tmp/ram/1/.shard:
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.10
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.11
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.12
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.13
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.14
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.15
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.16
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.17
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.2
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.22
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.23
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.27
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.28
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.3
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.31
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.34
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.35
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.37
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.39
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.4
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.40
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.44
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.45
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.46
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.47
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.53
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.54
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.55
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.57
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.58
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.6
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.63
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.7
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.9
/tmp/ram/2/.shard:
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.10
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.11
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.12
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.13
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.14
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.15
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.16
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.17
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.2
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.22
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.23
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.27
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.28
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.3
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.31
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.34
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.35
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.37
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.39
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.4
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.40
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.44
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.45
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.46
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.47
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.53
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.54
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.55
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.57
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.58
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.6
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.63
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.7
-rw-r--r-- 2 root root 32768 Sep 29 12:01 951d7c52-7230-420b-b8bb-da887fffd41e.9
So, /tmp/ram/{3,4}/.shard was not recovered. Even worse, /tmp/ram/5/.shard
has disappeared completely. And of course this breaks all I/O on /mnt/test/file0,
for example:
# dd if=/dev/random of=/mnt/test/file0 bs=1M count=4
dd: error writing '/mnt/test/file0': No such file or directory
dd: closing output file '/mnt/test/file0': No such file or directory
Any ideas on what's going on here?
Dmitry
_______________________________________________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-devel