Hi,
To test corruption detection and repair, we modified a file inside the brick directory on server glusterfs1, and scheduled regular scrubs. The corruption is detected:
Error count: 1
Corrupted object's [GFID]:
9be5eecf-5ad8-4256-8b08-879aecf65881 ==> BRICK: /data/brick1/gv0
path: /prd/drupal-files-prd/inline-images/small - main building 1_0.jpg
Corrupted object's [GFID]:
9be5eecf-5ad8-4256-8b08-879aecf65881 ==> BRICK: /data/brick1/gv0
path: /prd/drupal-files-prd/inline-images/small - main building 1_0.jpg
We have self-healing enabled, and used these steps to correct the corrupted object:
user@glusterfs1:~$ sudo find /data/brick1/gv0/.glusterfs -name 9be5eecf-5ad8-4256-8b08-879aecf65881
/data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
/data/brick1/gv0/.glusterfs/quarantine/9be5eecf-5ad8-4256-8b08-879aecf65881
user@glusterfs1:~$ sudo find /data/brick1 -samefile /data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
/data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
/data/brick1/gv0/prd/drupal-files-prd/inline-images/small - main building 1_0.jpg
user@glusterfs1:~$ sudo rm /data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
user@glusterfs1:~$ sudo rm "/data/brick1/gv0/prd/drupal-files-prd/inline-images/small - main building 1_0.jpg"
user@glusterfs1:~$ stat /glusterfs/prd/drupal-files-prd/inline-images/"small - main building 1_0.jpg"
File: /glusterfs/prd/drupal-files-prd/inline-images/small - main building 1_0.jpg
Size: 6296651 Blocks: 12299 IO Block: 131072 regular file
Device: 2dh/45d Inode: 10018406470555949185 Links: 1
Access: (0664/-rw-rw-r--) Uid: (42333178/ UNKNOWN) Gid: (41000002/ UNKNOWN)
Access: 2022-11-22 11:58:10.080206283 +0100
Modify: 2022-05-12 17:00:41.000000000 +0200
Change: 2022-11-22 12:41:18.095579069 +0100
Birth: -
Seems to have worked, but now we want to verify / confirm the repair, and use md5sum to compare the file between our thee glusterfs servers:
user@glusterfs1:~$ sudo find /data/brick1/gv0/.glusterfs -name 9be5eecf-5ad8-4256-8b08-879aecf65881
/data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
/data/brick1/gv0/.glusterfs/quarantine/9be5eecf-5ad8-4256-8b08-879aecf65881
user@glusterfs1:~$ sudo md5sum /data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
d41d8cd98f00b204e9800998ecf8427e /data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
user@glusterfs1:~$ sudo md5sum /data/brick1/gv0/.glusterfs/quarantine/9be5eecf-5ad8-4256-8b08-879aecf65881
d41d8cd98f00b204e9800998ecf8427e /data/brick1/gv0/.glusterfs/quarantine/9be5eecf-5ad8-4256-8b08-879aecf65881
but then on brick2 and brick3:
user@glusterfs2:~$ sudo md5sum /data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
d4927e00e0db4498bcbbaedf3b5680ed /data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
user@glusterfs3:~$ sudo md5sum /data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
d4927e00e0db4498bcbbaedf3b5680ed /data/brick1/gv0/.glusterfs/9b/e5/9be5eecf-5ad8-4256-8b08-879aecf65881
The md5sum does NOT match the repaired server.
What in our logic is wrong, why is this happening?
Some cluster info:
user@glusterfs2:~$ sudo gluster volume get gv0 all | grep self-heal
cluster.background-self-heal-count 8 (DEFAULT)
cluster.metadata-self-heal on
cluster.data-self-heal on
cluster.entry-self-heal on
cluster.self-heal-daemon on (DEFAULT)
cluster.self-heal-window-size 8 (DEFAULT)
cluster.data-self-heal-algorithm (null) (DEFAULT)
cluster.self-heal-readdir-size 1KB (DEFAULT)
cluster.disperse-self-heal-daemon enable (DEFAULT)
disperse.self-heal-window-size 32 (DEFAULT)
glusterfs 10.1 running on ubuntu 22.04.01 x86_64.
Any help would be appreciated!
MJ
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users