Hi all,
I noticed that I have two files which are not healed:
root@giant5:~# gluster volume heal gv0 info
Gathering Heal info on volume gv0 has been successful
Brick giant1:/gluster/sdc/gv0
Number of entries: 1
/holicki/lqcd/slurm-7251.out
Brick giant2:/gluster/sdc/gv0
Number of entries: 1
/holicki/lqcd/slurm-7251.out
Brick giant3:/gluster/sdc/gv0
Number of entries: 1
/holicki/lqcd/slurm-7249.out
Brick giant4:/gluster/sdc/gv0
Number of entries: 1
/holicki/lqcd/slurm-7249.out
Brick giant5:/gluster/sdc/gv0
Number of entries: 1
<gfid:e9793d5e-7174-49b0-9fa9-90f8c35948e7>
Brick giant6:/gluster/sdc/gv0
Number of entries: 1
<gfid:e9793d5e-7174-49b0-9fa9-90f8c35948e7>
Brick giant1:/gluster/sdd/gv0
Number of entries: 1
/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
Brick giant2:/gluster/sdd/gv0
Number of entries: 1
/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
Brick giant3:/gluster/sdd/gv0
Number of entries: 0
Brick giant4:/gluster/sdd/gv0
Number of entries: 0
Brick giant5:/gluster/sdd/gv0
Number of entries: 0
Brick giant6:/gluster/sdd/gv0
Number of entries: 0
(Disregard the file "slurm-7251.out", this is/was IO in progress.)
The logs are filled with entries like this:
[2016-09-30 12:45:26.611375] I [afr-self-heal-data.c:655:afr_sh_data_fix] 0-gv0-replicate-3: no active sinks for performing self-heal on file /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
[2016-09-30 12:45:36.874802] I [afr-self-heal-data.c:655:afr_sh_data_fix] 0-gv0-replicate-3: no active sinks for performing self-heal on file /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
[2016-09-30 12:45:53.701884] I [afr-self-heal-data.c:655:afr_sh_data_fix] 0-gv0-replicate-3: no active sinks for performing self-heal on file /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
I checked with md5sum that both files are identical.
Then, I used setfattr as proposed in an older thread in this mailing list:
setfattr -n trusted.afr.gv0-client-7 -v 0x000000000000000000000000 /gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
I did this on both nodes for both clients, so it now looks like this (on both nodes/bricks):
getfattr -d -m . -e hex /gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
getfattr: Removing leading '/' from absolute path names
# file: gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out
trusted.afr.gv0-client-6=0x000000000000000000000000
trusted.afr.gv0-client-7=0x000000000000000000000000
trusted.gfid=0xcb7978fa42e74a0b97928a87126338ac
I triggered heal, but the files do not disappear from heal info. But also, they are not listed in split-brain oder heal-failed.
I used gfid-resolver.sh for the other file:
e9793d5e-7174-49b0-9fa9-90f8c35948e7 == File: /gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu0.800/slurm-5663.out
This file is also marked as dirty:
root@giant5:/var/log/glusterfs# getfattr -d -m . -e hex /gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu0.800/slurm-5663.out
getfattr: Removing leading '/' from absolute path names
# file: gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu0.800/slurm-5663.out
trusted.afr.gv0-client-4=0x000000010000000000000000
trusted.afr.gv0-client-5=0x000000010000000000000000
trusted.gfid=0xe9793d5e717449b09fa990f8c35948e7
How can I fix this, i.e. get the files healed? I'm using gluster 3.4.2 on Ubuntu 14.04.3.
I also thought about scheduling a downtime and upgrading gluster, but I don't know if I can do this as long as there are files to be healed.
Thanks for any advice.
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users