Yes self heal seems to be broken. It seems to trust the first child in the list of subvolumes reguardless of its state. Meaning if you have AFR with 3 children, node1 node2 node3. If node3 or node2 blows up, failed harddrives whatever. (maybe just pulled for security updates) When node2 or node3 are added back everything is fine. If you have to pull node1 for any reason, when its added back everything breaks.
I created the following screen capture to show this more clearly.
http://enderzone.com/gluster.ogg
"killall glusterfsd and rm -rf /tank" = harddrive failure (very common and the point of AFR.
If AFR does not protect against hardware failure, what is it for?
nicolas prochazka wrote:
hello,
I find a bug in seal healing :
Two AFR servers act also as client
gluster mount point /mnt/vdisk
gluster backend point /mnt/disk
1 - touch /mnt/vdisk/TEST1 : ok on two server
2a - rm /mnt/disk/TEST1 on first server define on AFR translator
-> ls -l /mnt/vdisk send empty for all server : ok
2b - ( not 2a) : rm /mnt/disk/TEST1 on second server define on AFR translator
-> ls -l /mnt/vdisk send TEST1 for all server : not OK
This is first bug , i think the problem comes that Load balancing not
working, command are always execute on same server, the first define
this problem is also coming with read-subvolumes not works.
3a - ( second server is define as favorite child ) , no synchronise,
TEST1 never create ( normal that's always doing operation from server
1 ) .
Now i write some data on /mnt/disk/TEST1 from second server ) then
I touch /mnt/vdisk/TEST1 again => TEST1 synchronize on two server
with server 2 content : ok
In my point of views, ls /mnt/vdisk must not always get data from the
same server , isnt'it ?
I can correct this problem by do a touch on /mnt/vdisk on all files on
server backend 2 , so ls /mnt/vdisk send me 0 file size, but
favorite-child resynchronize with correct content.
To summarize
if i reinstall from zero a new server and in my conf client file, this
server appears as the first declare in afr subvolume, it can't be
synchronize with the second server.
Regards,
Nicolas Prochazka.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel