Ender, Please try the latest git. We did find an issue with subdirs getting skipped while syncing. Avati On Thu, Apr 23, 2009 at 3:24 AM, ender <ender@xxxxxxxxxxxxx> wrote: > Closer, but still no cigar.. > > all nodes: killall glusterfsd; killall glusterfs; > all nodes: rm -rf /tank/* > all nodes: glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol > all nodes: mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank > node3:~# cp -R gluster /gtank/gluster1 > *simulating a hardware failure > node1:~# killall glusterfsd ; killall glusterfs; > node1:~# killall glusterfsd ; killall glusterfs; > glusterfsd: no process killed > glusterfs: no process killed > node1:~# rm -rf /tank/* > *data never stops changing, just because we have a failed node > node3:~# cp -R gluster /gtank/gluster2 > all nodes but node1:~# ls -lR /gtank/ | wc -l > 2782 > all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l > 1393 > all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l > 1393 > *Adding hardware back into the network after replacing bad harddrive(s) > node1:~# glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol > node1:~# mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank > node3:~# ls -lR /gtank/ | wc -l > 1802 > node3:~# ls -lR /gtank/gluster1 | wc -l > 413 > node3:~# ls -lR /gtank/gluster2 | wc -l > 1393 > > Are you aware that taking the broken node1 out fixes the gluster system > again? > node1:~# killall glusterfsd ; killall glusterfs; > node1:~# killall glusterfsd ; killall glusterfs; > glusterfsd: no process killed > glusterfs: no process killed > all nodes but node1:~# ls -lR /gtank/ | wc -l > 2782 > all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l > 1393 > all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l > 1393 > > Add it back in > node3:~# ls -lR /gtank/gluster1 | wc -l > 413 > > And its broken again. > > > Thank you for working on gluster, and for the response! > Anand Avati wrote: >> >> Ender, >> There was a bug fix which went in to git today which fixes a similar >> bug.. a case where a subset of the files would be recreated if there >> are a lot of files (~1000 or more) when the node which was down was >> the first subvolume in the list. Please pull the latest patches and >> see if it solves your case. Thank you for your patience! >> >> Avati >> >> On Thu, Apr 23, 2009 at 2:29 AM, ender <ender@xxxxxxxxxxxxx> wrote: >>> >>> I was just wondering if the self heal bug is planned to be fixed, or if >>> they >>> developers are just ignoring it in hopes it will go away? Everytime i ask >>> someone privately if they can reproduce the problem on there own end, >>> they >>> go silent. (which leads me to believe that they in fact can reproduce it) >>> >>> Very simple, AFR. As many subvolumes as you want. The first listed >>> subvolume >>> will always break the self heal. node2 and node3 always heal fine. Swap >>> the >>> ip address of the first listed subvolume and you will swap the box which >>> breaks the selfheal. I have been able to repeat this bug every day with >>> the >>> newest git for the last month. >>> Please let us know if this is not considered a bug, or acknowledge it in >>> some fashion. Thank you. >>> same configs >>> all nodes: killall glusterfsd; killall glusterfs; >>> all nodes: rm -rf /tank/* >>> all nodes: glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol >>> all nodes: mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol >>> /gtank >>> node3:~# cp -R gluster /gtank/gluster1 >>> *simulating a hardware failure >>> node1:~# killall glusterfsd ; killall glusterfs; >>> node1:~# killall glusterfsd ; killall glusterfs; >>> glusterfsd: no process killed >>> glusterfs: no process killed >>> node1:~# rm -rf /tank/* >>> *data never stops changing, just because we have a failed node >>> node3:~# cp -R gluster /gtank/gluster2 >>> all nodes but node1:~# ls -lR /gtank/ | wc -l >>> 2780 >>> all nodes but node1:~# ls -lR /gtank/gluster1 | wc -l >>> 1387 >>> all nodes but node1:~# ls -lR /gtank/gluster2 | wc -l >>> 1387 >>> *Adding hardware back into the network after replacing bad harddrive(s) >>> node1:~# glusterfsd -f /usr/local/etc/glusterfs/glusterfsd.vol >>> node1:~# mount -t glusterfs /usr/local/etc/glusterfs/glusterfs.vol /gtank >>> node3:~# ls -lR /gtank/ | wc -l >>> 1664 >>> node3:~# ls -lR /gtank/gluster1 | wc -l >>> 271 >>> node3:~# ls -lR /gtank/gluster2 | wc -l >>> 1387 >>> >>> >>> ### Export volume "brick" with the contents of "/tank" directory. >>> volume posix >>> type storage/posix # POSIX FS translator >>> option directory /tank # Export this directory >>> end-volume >>> >>> volume locks >>> type features/locks >>> subvolumes posix >>> end-volume >>> >>> volume brick >>> type performance/io-threads >>> subvolumes locks >>> end-volume >>> >>> ### Add network serving capability to above brick. >>> volume server >>> type protocol/server >>> option transport-type tcp >>> subvolumes brick >>> option auth.addr.brick.allow * # Allow access to "brick" volume >>> option client-volume-filename /usr/local/etc/glusterfs/glusterfs.vol >>> end-volume >>> >>> >>> # >>> #mirror block0 >>> # >>> volume node1 >>> type protocol/client >>> option transport-type tcp >>> option remote-host node1.ip # IP >>> address >>> of the remote brick >>> # option transport-timeout 30 # seconds >>> to >>> wait for a reply from server for each request >>> option remote-subvolume brick # name of >>> the >>> remote volume >>> end-volume >>> volume node2 >>> type protocol/client >>> option transport-type tcp >>> option remote-host node2.ip # IP >>> address >>> of the remote brick >>> # option transport-timeout 30 # seconds >>> to >>> wait for a reply from server for each request >>> option remote-subvolume brick # name of >>> the >>> remote volume >>> end-volume >>> volume node3 >>> type protocol/client >>> option transport-type tcp >>> option remote-host node3.ip # IP >>> address >>> of the remote brick >>> # option transport-timeout 30 # seconds >>> to >>> wait for a reply from server for each request >>> option remote-subvolume brick # name of >>> the >>> remote volume >>> end-volume >>> >>> volume mirrorblock0 >>> type cluster/replicate >>> subvolumes node1 node2 node3 >>> option metadata-self-heal yes >>> end-volume >>> >>> >>> >>> >>> Gordan Bobic wrote: >>>> >>>> First-access failing bug still seems to be present. >>>> But other than that, it seems to be distinctly better than rc4. :) >>>> Good work! :) >>>> >>>> Gordan >>>> >>>> >>>> _______________________________________________ >>>> Gluster-devel mailing list >>>> Gluster-devel@xxxxxxxxxx >>>> http://lists.nongnu.org/mailman/listinfo/gluster-devel >>> >>> >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel@xxxxxxxxxx >>> http://lists.nongnu.org/mailman/listinfo/gluster-devel >>> > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >