Closer, but still no cigar..
all nodes: killall glusterfsd; killall glusterfs;
all nodes: rm -rf /tank/*
all nodes: glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
all nodes: mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank
node3:~# cp -R gluster /gtank/gluster1
*simulating a hardware failure
node1:~# killall glusterfsd ; killall glusterfs;
node1:~# killall glusterfsd ; killall glusterfs;
glusterfsd: no process killed
glusterfs: no process killed
node1:~# rm -rf /tank/*
*data never stops changing, just because we have a failed node
node3:~# cp -R gluster /gtank/gluster2
all nodes but node1:~# ls -lR /gtank/ | wc -l
2782
all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l
1393
all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l
1393
*Adding hardware back into the network after replacing bad harddrive(s)
node1:~# glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
node1:~# mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank
node3:~# ls -lR /gtank/ | wc -l
1802
node3:~# ls -lR /gtank/gluster1 | wc -l
413
node3:~# ls -lR /gtank/gluster2 | wc -l
1393
Are you aware that taking the broken node1 out fixes the gluster system again?
node1:~# killall glusterfsd ; killall glusterfs;
node1:~# killall glusterfsd ; killall glusterfs;
glusterfsd: no process killed
glusterfs: no process killed
all nodes but node1:~# ls -lR /gtank/ | wc -l
2782
all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l
1393
all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l
1393
Add it back in
node3:~# ls -lR /gtank/gluster1 | wc -l
413
And its broken again.
Thank you for working on gluster, and for the response!
Anand Avati wrote:
Ender,
There was a bug fix which went in to git today which fixes a similar
bug.. a case where a subset of the files would be recreated if there
are a lot of files (~1000 or more) when the node which was down was
the first subvolume in the list. Please pull the latest patches and
see if it solves your case. Thank you for your patience!
Avati
On Thu, Apr 23, 2009 at 2:29 AM, ender <ender@xxxxxxxxxxxxx> wrote:
I was just wondering if the self heal bug is planned to be fixed, or if they
developers are just ignoring it in hopes it will go away? Everytime i ask
someone privately if they can reproduce the problem on there own end, they
go silent. (which leads me to believe that they in fact can reproduce it)
Very simple, AFR. As many subvolumes as you want. The first listed subvolume
will always break the self heal. node2 and node3 always heal fine. Swap the
ip address of the first listed subvolume and you will swap the box which
breaks the selfheal. I have been able to repeat this bug every day with the
newest git for the last month.
Please let us know if this is not considered a bug, or acknowledge it in
some fashion. Thank you.
same configs
all nodes: killall glusterfsd; killall glusterfs;
all nodes: rm -rf /tank/*
all nodes: glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
all nodes: mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank
node3:~# cp -R gluster /gtank/gluster1
*simulating a hardware failure
node1:~# killall glusterfsd ; killall glusterfs;
node1:~# killall glusterfsd ; killall glusterfs;
glusterfsd: no process killed
glusterfs: no process killed
node1:~# rm -rf /tank/*
*data never stops changing, just because we have a failed node
node3:~# cp -R gluster /gtank/gluster2
all nodes but node1:~# ls -lR /gtank/ | wc -l
2780
all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l
1387
all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l
1387
*Adding hardware back into the network after replacing bad harddrive(s)
node1:~# glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol
node1:~# mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank
node3:~# ls -lR /gtank/ | wc -l
1664
node3:~# ls -lR /gtank/gluster1 | wc -l
271
node3:~# ls -lR /gtank/gluster2 | wc -l
1387
### Export volume "brick" with the contents of "/tank" directory.
volume posix
type storage/posix # POSIX FS translator
option directory /tank # Export this directory
end-volume
volume locks
type features/locks
subvolumes posix
end-volume
volume brick
type performance/io-threads
subvolumes locks
end-volume
### Add network serving capability to above brick.
volume server
type protocol/server
option transport-type tcp
subvolumes brick
option auth.addr.brick.allow * # Allow access to "brick" volume
option client-volume-filename /usr/local/etc/glusterfs/glusterfs.vol
end-volume
#
#mirror block0
#
volume node1
type protocol/client
option transport-type tcp
option remote-host node1.ip # IP address
of the remote brick
# option transport-timeout 30 # seconds to
wait for a reply from server for each request
option remote-subvolume brick # name of the
remote volume
end-volume
volume node2
type protocol/client
option transport-type tcp
option remote-host node2.ip # IP address
of the remote brick
# option transport-timeout 30 # seconds to
wait for a reply from server for each request
option remote-subvolume brick # name of the
remote volume
end-volume
volume node3
type protocol/client
option transport-type tcp
option remote-host node3.ip # IP address
of the remote brick
# option transport-timeout 30 # seconds to
wait for a reply from server for each request
option remote-subvolume brick # name of the
remote volume
end-volume
volume mirrorblock0
type cluster/replicate
subvolumes node1 node2 node3
option metadata-self-heal yes
end-volume
Gordan Bobic wrote:
First-access failing bug still seems to be present.
But other than that, it seems to be distinctly better than rc4. :)
Good work! :)
Gordan
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel