Hi Adrian, Correct me if I've got you wrong - You have 2 servers and a client replicates to both the servers. If the first server is down, the client also does not respond. You mentioned about more than 1 client - can you clarify this so that we can try and understand the issue. Pavan On 01/10/09 08:41 +0200, Adrian Moisey wrote: > Hi > > I am currently testing GlusterFS in with replication. > I am running Ubuntu hardy using packages from the PPA on launchpad.net. > I am currently using glusterfs 2.0.6. > > I have 2 machines, both exporting 1 brick each. This is the config I'm > using: > ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- > volume posix > type storage/posix > option directory /home/export/ > end-volume > > volume locks > type features/locks > subvolumes posix > end-volume > > volume cache > type performance/io-cache > subvolumes locks > end-volume > > volume brick > type performance/io-threads > option thread-count 8 > subvolumes cache > end-volume > > ### Add network serving capability to above brick. > volume server > type protocol/server > option transport-type tcp > subvolumes brick > option auth.addr.brick.allow * # Allow access to "brick" volume > end-volume > ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- > > I then have 2 clients (which happen to be the same 2 machines) that > connect to both bricks and replicate them using this config: > > ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- > ### Add client feature and attach to remote subvolume of server1 > volume brick1 > type protocol/client > option transport-type tcp > option remote-host 172.19.45.102 # IP address of the remote brick > option remote-subvolume brick # name of the remote volume > end-volume > > ### Add client feature and attach to remote subvolume of server2 > volume brick2 > type protocol/client > option transport-type tcp > option remote-host 172.19.45.103 # IP address of the remote brick > option remote-subvolume brick # name of the remote volume > end-volume > > volume replicate > type cluster/replicate > subvolumes brick1 brick2 > end-volume > ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- > > If I start the 2 servers up, then mount both clients everything works > file. I have shared storage which is replicated to each host. > > If I shut the one brick down, the client on that machine also dies and I > get strange errors: > ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- > # cd /mnt/gluster > bash: cd: /mnt/gluster: Transport endpoint is not connected > # df -h > Filesystem Size Used Avail Use% Mounted on > /dev/sda1 9.5G 1.1G 7.9G 13% / > varrun 125M 68K 125M 1% /var/run > varlock 125M 0 125M 0% /var/lock > udev 125M 44K 125M 1% /dev > devshm 125M 0 125M 0% /dev/shm > df: `/mnt/gluster': Transport endpoint is not connected > # mount > /dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro) > proc on /proc type proc (rw,noexec,nosuid,nodev) > /sys on /sys type sysfs (rw,noexec,nosuid,nodev) > varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755) > varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777) > udev on /dev type tmpfs (rw,mode=0755) > devshm on /dev/shm type tmpfs (rw) > devpts on /dev/pts type devpts (rw,gid=5,mode=620) > securityfs on /sys/kernel/security type securityfs (rw) > /etc/glusterfs/glusterfs.vol on /mnt/gluster type fuse.glusterfs > (rw,allow_other,default_permissions,max_read=131072) > ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- > > Here is a copy of debug logs: > [2009-10-01 08:16:15] D [glusterfsd.c:354:_get_specfp] glusterfs: > loading volume file /etc/glusterfs/glusterfs.vol > ================================================================================ > Version : glusterfs 2.0.6 built on Aug 31 2009 20:14:31 > TLA Revision : v2.0.6 > Starting Time: 2009-10-01 08:16:15 > Command line : glusterfs --log-level=DEBUG > --volfile=/etc/glusterfs/glusterfs.vol /mnt/gluster/ > PID : 17884 > System name : Linux > Nodename : cj-cpt-molb01 > Kernel Release : 2.6.24-24-server > Hardware Identifier: i686 > > Given volfile: > +------------------------------------------------------------------------------+ > 1: ### Add client feature and attach to remote subvolume of server1 > 2: volume brick1 > 3: type protocol/client > 4: option transport-type tcp > 5: option remote-host 172.19.45.102 # IP address of the remote > brick > 6: option remote-subvolume brick # name of the remote volume > 7: end-volume > 8: > 9: ### Add client feature and attach to remote subvolume of server2 > 10: volume brick2 > 11: type protocol/client > 12: option transport-type tcp > 13: option remote-host 172.19.45.103 # IP address of the remote > brick > 14: option remote-subvolume brick # name of the remote volume > 15: end-volume > 16: > 17: volume replicate > 18: type cluster/replicate > 19: subvolumes brick1 brick2 > 20: end-volume > > +------------------------------------------------------------------------------+ > [2009-10-01 08:16:15] D [glusterfsd.c:1205:main] glusterfs: running in > pid 17884 > [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick1: defaulting > frame-timeout to 30mins > [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick1: defaulting > ping-timeout to 10 > [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: > attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so > [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: > attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so > [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick2: defaulting > frame-timeout to 30mins > [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick2: defaulting > ping-timeout to 10 > [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: > attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so > [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: > attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so > [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got > GF_EVENT_PARENT_UP, attempting connect on transport > [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got > GF_EVENT_PARENT_UP, attempting connect on transport > [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got > GF_EVENT_PARENT_UP, attempting connect on transport > [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got > GF_EVENT_PARENT_UP, attempting connect on transport > [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got > GF_EVENT_PARENT_UP, attempting connect on transport > [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got > GF_EVENT_PARENT_UP, attempting connect on transport > [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got > GF_EVENT_PARENT_UP, attempting connect on transport > [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got > GF_EVENT_PARENT_UP, attempting connect on transport > [2009-10-01 08:16:15] N [glusterfsd.c:1224:main] glusterfs: Successfully > started > [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got > GF_EVENT_CHILD_UP > [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got > GF_EVENT_CHILD_UP > [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] > brick1: Connected to 172.19.45.102:6996, attached to remote volume > 'brick'. > [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume > 'brick1' came back up; going online. > [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] > brick1: Connected to 172.19.45.102:6996, attached to remote volume > 'brick'. > [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume > 'brick1' came back up; going online. > [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got > GF_EVENT_CHILD_UP > [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got > GF_EVENT_CHILD_UP > [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] > brick2: Connected to 172.19.45.103:6996, attached to remote volume > 'brick'. > [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] > brick2: Connected to 172.19.45.103:6996, attached to remote volume > 'brick'. > [2009-10-01 08:17:24] N [client-protocol.c:6246:notify] brick1: disconnected > [2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1: > connection to 172.19.45.102:6996 failed (Connection refused) > [2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1: > connection to 172.19.45.102:6996 failed (Connection refused) > > > > Any ideas? > > > -- > Adrian Moisey > Systems Designer | CareerJunction | Better jobs. More often. > Web: www.careerjunction.co.za | Email: adrian at careerjunction.co.za > Phone: +27 21 818 8621 | Mobile: +27 82 858 7830 | Fax: +27 21 818 8855 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users