Hi Sam, A fix is in the works regarding the order of the subvols you mentioned. Krishna On Feb 19, 2008 8:21 AM, Sam Douglas <sam.douglas32@xxxxxxxxx> wrote: > Hi, > > == Background == > > We are setting up GlusterFS on a compute cluster. Each node has two > disk partitions /media/gluster1 and /media/gluster2 which are used for > the cluster storage. > > We are currently using builds from TLA (671 as of now) > > I have a script to generate GlusterFS client configurations that > create AFR instances over pairs of nodes in the cluster, a snippet > from our current configuration: > > # Client definitions > volume client-cn2-1 > type protocol/client > option transport-type tcp/client > option remote-host cn2 > option remote-subvolume brick1 > end-volume > > volume client-cn2-2 > type protocol/client > option transport-type tcp/client > option remote-host cn2 > option remote-subvolume brick2 > end-volume > > volume client-cn3-1 > type protocol/client > option transport-type tcp/client > option remote-host cn3 > option remote-subvolume brick1 > end-volume > > volume client-cn3-2 > type protocol/client > option transport-type tcp/client > option remote-host cn3 > option remote-subvolume brick2 > end-volume > > ### snip - you get the idea ### > > # Generated AFR volumes > volume afr-cn2-cn3 > type cluster/afr > subvolumes client-cn2-1 client-cn3-2 > end-volume > > volume afr-cn3-cn4 > type cluster/afr > subvolumes client-cn3-1 client-cn4-2 > end-volume > > > ### and so on ### > > volume unify > type cluster/unify > option scheduler rr > option namespace namespace > subvolumes afr-cn2-cn3 afr-cn3-cn4 afr-cn4-cn5 ... > end-volume > > > == Self healing program == > > I wrote a quick C program (medic) that uses the nftw function and > opens all files in a directory tree, and readlinks all symlinks. This > seems effective at forcing AFR to heal. > > > == Playing with AFR == > > We have a test cluster of 6 nodes set up. > > In this setup, cluster node 2 is involved in 'afr-cn2-cn3' and > 'afr-cn7-cn2'. > > I copy a large directory tree onto the cluster filesystem (such as > /usr), then 'cripple' node cn2 by deleting the data from its backends > and restarting glusterfsd on that system; to emulate the system going > offline/losing data. > > (at this point, all the data is still available on the filesystem) > > Running medic over the filesystem mount will now cause the data to be > copied back onto cn2's appropriate volumes and all is happy. > > Opening all files on the filesystem seems a stupid waste of time if > you know which volumes have gone down (and when you have over 20TB in > hundreds of thousands of files, that is a considerable waste of time), > so I looked into mounting the parts of the client translator tree into > separate mount points and running medic over those. > > # mkdir /tmp/glfs > # generate_client_conf > /tmp/glusterfs.vol > # glusterfs -f /tmp/glusterfs.vol -n afr-cn2-cn3 /tmp/glfs > # ls /tmp/glfs > home/ > [Should be: home/ usr/] > > A `cd /tmp/glfs/usr/` will succeed and usr/ will be self-healed, but > the contents will not. Likewise a `cat /tmp/glfs/usr/include/stdio.h` > will output the contents of the file and cause it to be self-healed. > > Changing the order of the subvolumes to the 'afr-cn2-cn3' volume so > that the up to date client is the first volume causes the directory to > be correctly listed. > > This seems to me like a minor-ish bug in cluster/afr's readdir > functionality. > > -- Sam Douglas > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >