Hi I am currently testing GlusterFS in with replication. I am running Ubuntu hardy using packages from the PPA on launchpad.net. I am currently using glusterfs 2.0.6. I have 2 machines, both exporting 1 brick each. This is the config I'm using: ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- volume posix type storage/posix option directory /home/export/ end-volume volume locks type features/locks subvolumes posix end-volume volume cache type performance/io-cache subvolumes locks end-volume volume brick type performance/io-threads option thread-count 8 subvolumes cache end-volume ### Add network serving capability to above brick. volume server type protocol/server option transport-type tcp subvolumes brick option auth.addr.brick.allow * # Allow access to "brick" volume end-volume ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- I then have 2 clients (which happen to be the same 2 machines) that connect to both bricks and replicate them using this config: ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- ### Add client feature and attach to remote subvolume of server1 volume brick1 type protocol/client option transport-type tcp option remote-host 172.19.45.102 # IP address of the remote brick option remote-subvolume brick # name of the remote volume end-volume ### Add client feature and attach to remote subvolume of server2 volume brick2 type protocol/client option transport-type tcp option remote-host 172.19.45.103 # IP address of the remote brick option remote-subvolume brick # name of the remote volume end-volume volume replicate type cluster/replicate subvolumes brick1 brick2 end-volume ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- If I start the 2 servers up, then mount both clients everything works file. I have shared storage which is replicated to each host. If I shut the one brick down, the client on that machine also dies and I get strange errors: ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- # cd /mnt/gluster bash: cd: /mnt/gluster: Transport endpoint is not connected # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 9.5G 1.1G 7.9G 13% / varrun 125M 68K 125M 1% /var/run varlock 125M 0 125M 0% /var/lock udev 125M 44K 125M 1% /dev devshm 125M 0 125M 0% /dev/shm df: `/mnt/gluster': Transport endpoint is not connected # mount /dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro) proc on /proc type proc (rw,noexec,nosuid,nodev) /sys on /sys type sysfs (rw,noexec,nosuid,nodev) varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755) varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777) udev on /dev type tmpfs (rw,mode=0755) devshm on /dev/shm type tmpfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) securityfs on /sys/kernel/security type securityfs (rw) /etc/glusterfs/glusterfs.vol on /mnt/gluster type fuse.glusterfs (rw,allow_other,default_permissions,max_read=131072) ----8<----8<----8<----8<----8<----8<----8<----8<----8<---- Here is a copy of debug logs: [2009-10-01 08:16:15] D [glusterfsd.c:354:_get_specfp] glusterfs: loading volume file /etc/glusterfs/glusterfs.vol ================================================================================ Version : glusterfs 2.0.6 built on Aug 31 2009 20:14:31 TLA Revision : v2.0.6 Starting Time: 2009-10-01 08:16:15 Command line : glusterfs --log-level=DEBUG --volfile=/etc/glusterfs/glusterfs.vol /mnt/gluster/ PID : 17884 System name : Linux Nodename : cj-cpt-molb01 Kernel Release : 2.6.24-24-server Hardware Identifier: i686 Given volfile: +------------------------------------------------------------------------------+ 1: ### Add client feature and attach to remote subvolume of server1 2: volume brick1 3: type protocol/client 4: option transport-type tcp 5: option remote-host 172.19.45.102 # IP address of the remote brick 6: option remote-subvolume brick # name of the remote volume 7: end-volume 8: 9: ### Add client feature and attach to remote subvolume of server2 10: volume brick2 11: type protocol/client 12: option transport-type tcp 13: option remote-host 172.19.45.103 # IP address of the remote brick 14: option remote-subvolume brick # name of the remote volume 15: end-volume 16: 17: volume replicate 18: type cluster/replicate 19: subvolumes brick1 brick2 20: end-volume +------------------------------------------------------------------------------+ [2009-10-01 08:16:15] D [glusterfsd.c:1205:main] glusterfs: running in pid 17884 [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick1: defaulting frame-timeout to 30mins [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick1: defaulting ping-timeout to 10 [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick2: defaulting frame-timeout to 30mins [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick2: defaulting ping-timeout to 10 [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got GF_EVENT_PARENT_UP, attempting connect on transport [2009-10-01 08:16:15] N [glusterfsd.c:1224:main] glusterfs: Successfully started [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got GF_EVENT_CHILD_UP [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got GF_EVENT_CHILD_UP [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] brick1: Connected to 172.19.45.102:6996, attached to remote volume 'brick'. [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume 'brick1' came back up; going online. [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] brick1: Connected to 172.19.45.102:6996, attached to remote volume 'brick'. [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume 'brick1' came back up; going online. [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got GF_EVENT_CHILD_UP [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got GF_EVENT_CHILD_UP [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] brick2: Connected to 172.19.45.103:6996, attached to remote volume 'brick'. [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] brick2: Connected to 172.19.45.103:6996, attached to remote volume 'brick'. [2009-10-01 08:17:24] N [client-protocol.c:6246:notify] brick1: disconnected [2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1: connection to 172.19.45.102:6996 failed (Connection refused) [2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1: connection to 172.19.45.102:6996 failed (Connection refused) Any ideas? -- Adrian Moisey Systems Designer | CareerJunction | Better jobs. More often. Web: www.careerjunction.co.za | Email: adrian at careerjunction.co.za Phone: +27 21 818 8621 | Mobile: +27 82 858 7830 | Fax: +27 21 818 8855