Daniel, There were fixes that went into HA recently. Can you check if the bug is still there? Krishna On Wed, Jan 14, 2009 at 11:22 PM, Daniel Maher <dma+gluster at witbe.net> wrote: > Hi all, > > In testing the HA translator under 2.0.0rc1, i've managed to create a > simple and reproducible scenario in which Gluster fails to maintain > communication between the client and the server(s). > > Server01 and Server02 are AFR'ing each other, with Client01 connected > via the HA translator. As a simple test, i launch a script that echoes > an increasing counter to a text file in the Gluster mount on Client01. > Client01 is communicating with Server01 in this instance. > > I cleanly stop glusterfsd on Server01, and after a momentary hiccup > (noted in the log excerpt below), things continue to function as > expected - Client01 commences communication with Server02. So far so good. > > 2009-01-15 15:54:19 E [socket.c:708:socket_connect_finish] export01: > connection failed (Connection refused) > > I re-start glusterfsd on Server01, then, i cleanly stop glusterfsd on > Server02 (which, of course, Client01 is now communicating with). > Client01 freaks out (see log excerpt below), does /not/ attempt to > contact Server01 again, and leaves me with the dreaded "transport > endpoint not connected" situation. > > 2009-01-15 16:06:02 E [ha-helpers.c:266:_ha_next_active_child_for_ctx] > export-ha: none of the children are connected other than export02 > 2009-01-15 16:06:02 E [ha.c:2715:ha_fstat_cbk] export-ha: no active > subvolume > 2009-01-15 16:06:02 E [fuse-bridge.c:533:fuse_attr_cbk] glusterfs-fuse: > 2932: FSTAT() /counter.txt => -1 (Transport endpoint is not connected) > > Client01 sometimes recovers from this, and sometimes it does not. When > it does not recover from this situation, the only solution is manual > intervention (unmount / remount). That's not the worst of it, though : > when it /does/ recover, re-starting glusterfsd on Server02 (!) causes > even more of the errors (see below), and /always/ results in a total > failure on Client01 within a second or two (transport endpoint not > connected). Client01 never recovers from this. > > 2009-01-15 19:04:56 E [ha-helpers.c:266:_ha_next_active_child_for_ctx] > export-ha: none of the children are connected other than export01 > 2009-01-15 19:04:56 E [ha.c:2515:ha_flush_cbk] export-ha: no active > subvolume > 2009-01-15 19:04:56 E [fuse-bridge.c:911:fuse_err_cbk] glusterfs-fuse: > 3058: FLUSH() ERR => -1 (Transport endpoint is not connected) > > > I strongly suspect this is not the expected behaviour of the High > Availability translator. :) > > > Servers are running FC9 i386, Client is FC10 i386. > > # glusterfs --version > glusterfs 2.0.0rc1 built on Jan 14 2009 13:19:06 > Repository revision: glusterfs--mainline--3.0--patch-844 > > # rpm -qa | grep fuse > fuse-2.7.3glfs10-1.i386 > fuse-devel-2.7.3glfs10-1.i386 > fuse-libs-2.7.3glfs10-1.i386 > > > Server config : > > # cat /etc/glusterfs/glusterfs-server.vol > # dataspace > volume test-ds > type storage/posix > option directory /opt/datadir > end-volume > > # posix locks for test-ds > volume test-ds-locks > type features/locks > option mandatory-locks on > subvolumes test-ds > end-volume > > # dataspace of test-ds on Server01 > volume test-01-ds > type protocol/client > option transport-type tcp/client > option remote-host 192.168.0.183 > option remote-subvolume test-ds-locks > option transport-timeout 10 > end-volume > > # automatic file replication translator for test dataspace > volume test-ds-afr > type cluster/afr > subvolumes test-ds-locks test-01-ds > end-volume > > # the actual export > volume export > type performance/io-threads > option thread-count 8 > subvolumes test-ds-afr > end-volume > > # server declaration > volume server > type protocol/server > option transport-type tcp/server > subvolumes export > option auth.addr.export.allow > 192.168.0.73,192.168.0.183,192.168.0.166,127.0.0.1 > option auth.addr.test-ds-locks.allow > 192.168.0.73,192.168.0.183,192.168.0.166,127.0.0.1 > end-volume > > > > client config : > # cat /etc/glusterfs/glusterfs-client.vol > > # export on Server01 > volume export01 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.0.183 > option remote-subvolume export # exported volume > end-volume > > # export on Server02 > volume export02 > type protocol/client > option transport-type tcp/client > option remote-host 192.168.0.166 > option remote-subvolume export # exported volume > end-volume > > # exports clustered via HA > volume export-ha > type cluster/ha > subvolumes export01 export02 > end-volume > > > > -- > Daniel Maher <dma+gluster AT witbe DOT net> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >