lease consider the master branch unusable for a few more weeks. You can checkout 'release-2.0' branch for staying up to date with the latest of the stable branch. Avati On Sun, Sep 6, 2009 at 3:12 PM, Mark Mielke<mark@xxxxxxxxxxxxxx> wrote: > Ok - I think this turns out to be a GlusterFS 2.1.0-git specific bug, but > I've included all of the details: > > My first use of GlusterFS is using GlusterFS 2.1.0-git on Fedora 11 / x86_64 > with ext4 partitions. / is an ext4 partition. /export/gluster-test is a > different ext4 partition. I do use NFS + AutoFS, and NFS does export other > partitions under /export. This is to be a very simple client/server. > > For the server, I have this: > > # cat /export/gluster-test-server.vol > volume brick > type storage/posix > option directory /export/gluster-test/ > end-volume > > volume server > type protocol/server > option transport-type tcp > subvolumes brick > option auth.addr.brick.allow 47.134.128.* > end-volume > > For the client, I have this: > > # cat /export/gluster-test-client.vol > volume brick1 > type protocol/client > option transport-type tcp > option remote-host 47.134.128.21 > #option remote-port 7000 > option remote-subvolume brick > end-volume > > The server/client IP is 47.134.128.21. There are no firewalls active on this > machine at this time. > > To launch the server, I used: (GlusterFS 2.1.0-git install int > /opt/glusterfs) > > # /opt/glusterfs/sbin/glusterfsd --volfile=/export/gluster-test-server.vol > > To launch the client and mount, I used: > > # mkdir /tmp/t > # mount -t glusterfs /export/gluster-test-client.vol /tmp/t > ... output says FUSE initialized ... > # cd /tmp/t > > From this point, I *appeared* to be able to modify /tmp/t. However, it turns > out that the mount did not actually complete, and I was just changing /tmp/t > under /tmp, not under GlusterFS. I believe this matches the documented usage > under gluster.org: > > bash# mount -t glusterfs /etc/glusterfs/glusterfs.vol /mnt/glusterfs > > I determined that I was able to sudo / su / login / run commands from > "/bin", however when I did "ls /" or "ls /export", everything would freeze > and "/sbin/shutdown -r now" would not complete. "cd /export" would also > freeze. I suspect that "ls /" does stat("/export") and this is why it > freezes. During this investigation period, I noticed the ps output was > strange: > > root 2312 1 0 16:10 ? 00:00:00 > /opt/glusterfs/sbin/glusterfsd --volfile=gluster-test-server.vol > root 2370 1 0 16:11 ? 00:00:00 > /opt/glusterfs/sbin/glusterfs --log-level=NORMAL > --volfile=/export/gluster-test-client.vol /export > root 2385 2370 0 16:11 ? 00:00:00 /bin/mount -i -f -t > fuse.glusterfs -o allow_other,default_permissions,max_read=131072 > /export/gluster-test-client.vol /export > root 2577 2467 0 16:13 tty4 00:00:00 grep gluster > > Why is it trying to mount on /export? > > I ran this test multiple times - each time my ''mount -t glusterfs" was on > /tmp/t - I never used /export. Each time, it had the same results - the > /sbin/mount.glusterfs was somehow translating it to /export. I determined to > trace some of the processes, and found that the I could "strace -p" for > glusterfsd and glusterfs, but I could "strace -p" of 2385 would freeze. > Control-C was frozen for all of these, including the "strace -p", however, > if I did "kill -9" (regular kill did not work) of the /bin/mount process, > then the "strace -p" would come back. Finally, I killed /bin/mount *three* > times (it came back twice?), and killed glusterfs, the system went back to > normal with no freezes. During this, I also did a df on /tmp/t which showed > that /tmp/t was /, but df in general (which presumedly was trying to query > /export) would freeze. > > To confirm this thinking, I started the glusterfs mount directly: > > # /opt/glusterfs/bin/glusterfs --volfile=/export/gluster-test-client.vol > /tmp/t > > And it worked perfectly - no freeze, and /tmp/t was a proper glusterfs > mount. Changes to /tmp/t were reflected in /export/gluster-test. > > I also determined that the complete system freeze and failure to > "/sbin/shutdown -r now" was due to failure for NFS to shut down properly > while the system was in the "frozen" state. If I restarted the whole > scenario, but ensured that both "nfs" and "autofs" were NOT running, then > although accesses to /export would freeze, I was able to restart the system > using "/sbin/shutdown -r now" or Ctrl-Alt-Del from the console. So, the real > freeze was that any access to /export would become stuck in the kernel like > an NFS hard mount. I did not wait around to see if it would time out after > 30 minutes as I was running these tests in quick succession and my family > was waiting for me outside in the car. :-) > > Thinking about the above - I think /sbin/mount.glusterfs must have a problem > for it to use /export even though I passed in /tmp - but, this is not the > only problem. There is also some sort of other failure that causes system > lockup instead of clean failure. One scenario I can think of is that it is > trying to mount /export against something /export/gluster-test, and this > might be leading to some sort of loop? I think /export was being put in a > half-mounted state, where it was being controlled by FUSE/GlusterFS, but > GlusterFS was not able to serve any requests on it? > > Going back to /sbin/mount.glusterfs, here is a more exact test showing this > problem: > > [root@wcarh033]/# mount -t glusterfs /export/gluster-test-client.vol /tmp > [root@wcarh033]/# ps -ef | grep gluster > root 3221 1 0 17:54 ? 00:00:00 > /opt/glusterfs/sbin/glusterfs --log-level=NORMAL > --volfile=/export/gluster-test-client.vol /export > root 3232 3221 0 17:54 ? 00:00:00 /bin/mount -i -f -t > fuse.glusterfs -o allow_other,default_permissions,max_read=131072 > /export/gluster-test-client.vol /export > root 3238 3151 0 17:54 pts/0 00:00:00 grep gluster > > If I try to recover from this, I can recover from the freeze, but not from > the whole situation: > > [root@wcarh033]/# kill -9 3221 > [root@wcarh033]/# ps -ef | grep gluster > root 3232 1 0 17:54 ? 00:00:00 /bin/mount -i -f -t > fuse.glusterfs -o allow_other,default_permissions,max_read=131072 > /export/gluster-test-client.vol /export > root 3243 3151 0 17:56 pts/0 00:00:00 grep gluster > [root@wcarh033]/# kill -9 3232 > [root@wcarh033]/# ps -ef | grep gluster > root 3245 3151 0 17:56 pts/0 00:00:00 grep gluster > [root@wcarh033]/# ls /export > ls: cannot access /export: Transport endpoint is not connected > > I reboot the machine to clean up for that, at least for now. > > Where is /export coming from? It's on the command line - I wonder if the > command line parsing is broken? > > In /sbin/mount.glusterfs, I see these lines which do not appear in GlusterFS > 2.0.6: > > mount_provided=$(echo "$@" | cut -f2 -d'/'); > > [ -n "$mount_provided" ] && { > mount_point="/$mount_provided"; > } > > [ -z "$mount_point" ] && { > usage; > exit 0; > } > > > Before, it used to say: > > mount_point="$2"; > > If I switch the code back to what it used to be, my original test works > fine. No freeze. Whoohoo! > > Please fix in GIT. Thanks. > > Cheers, > mark > > -- > Mark Mielke <mark@xxxxxxxxx> > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > >