Re: Problem with self-heal

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Thu, 03 Jul 2014 21:24:33 +0530

On 07/03/2014 08:56 PM, Tiziano Müller wrote:
Hi Pranith

Am 03.07.2014 17:16, schrieb Pranith Kumar Karampuri:
[...]
Is there some documentation on the meaning of the
trusted.afr.virtualization-client attribute?
https://github.com/gluster/glusterfs/blob/master/doc/features/afr-v1.md
Thanks.

Is I/O happening on those files? I think yes because they are VM files. There
was this problem of false +ves with releases earlier than 3.5.1. Releases
earlier than 3.5.1 did not have capability to distinguish between on-going I/O
and requirement of self-heal. So even if I/O is happening they will be shown
under files that need self-heal.
Ok, that explains why some of the files are suddenly listed and then vanish again.

The problem is that when we shut down all VMs (which were using gfapi) last
week, some images were listed as to be self-healed, but no I/O happened.
Also after a gluster vol stop/start and a reboot, the same files were listed and
nothing changed. After comparing the checksums of the files on the 2 bricks we
resumed operation.
It would be helpful if you could provide getfattr output when such 
things happen so that we can try to see why it is happening that way.
These are afr changelog smells I developed over time working on afr, 
they would be correct most of the times but not always:
Once I see getfattr output on both the bricks,
1) If files have equal numbers and the files are undergoing changes, 
most probably it is just normal I/O no heal is required
2) If files have unequal numbers with the numbers differing by a lot and 
files are undergoing changes, then most probably heal is required while 
I/O is going on.
3) If files have unequal numbers with numbers differing and files are 
not undergoing changes, the heal is required.
4) If files have equal numbers with same numbers and files are not 
undergoing changes, then the mount must have crashed or the volume is 
stopped while the I/O is in progress.

Again these are just most probable guesses not accurate.

Pranith

Any ideas?

Best,
Tiziano

Pranith

Thanks in advance,
Tiziano

Pranith
Best,
Tiziano

Am 01.07.2014 22:58, schrieb Miloš Kozák:
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration with one
disk each and replica 2 mode.

I have two servers connected by a cable. Through this cable I let glusterd
communicate. I start dd to create a relatively large file. In the middle of
writing process I disconnect the cable, so on one server (node1) I can see all
data and on the other one (node2) I can see just a split of the file when
writing is finished.. no surprise so far.

Then I put the cable back. After a while peers are discovered, self-healing
daemons start to communicate, so I can see:

gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1

Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1

But on the network there are no data moving, which I verify by df..

Any help? In my opinion after a while I should get my nodes synchronized, but
after 20minuts of waiting still nothing (the file was 2G big)

Thanks Milos
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users