> When you first mount your volume, look in the client log and see if it's connecting to both bricks. > I suspect it's not and that the failure is related to firewall settings. Logs from both nodes below. For this test, first I did "umount /firewall-scripts" from both nodes. Then I did ?mount ?av? using the default parameters in my fstab file. I did **not** turn on the backupvolfile-server=<secondary server> for this test. And then in another window, I did "tail tail /var/log/glusterfs/firewall-scripts.log -f" and you can see the spot where I mounted my file system back up again. Note that everything works as expected when both nodes are online, so this suggests everyone can see everyone else when things are steady-state. Also note that backupvolfile-server=<secondary server> changed the behavior - I documented this in an earlier post. > ...the failure is related to firewall settings. No way. I?m wide open on the interface I?m using for heartbeat and glusterfs. In my application, I take node fw1 offline by inserting a firewall rule and then getting rid of it a few seconds later. For testing right now, I just insert the rule by hand, look at a bunch of stuff, then get rid of it later. But since you brought it up, I cleaned out all firewall rules before doing and logging the mounts below. Near as I can tell, it looks like everyone can see everyone else. And the logs look the same to my eye as they did before I dropped all (not relevant) firewall rules. Log from fw1: [root at chicago-fw1 ~]# [root at chicago-fw1 ~]# tail /var/log/glusterfs/firewall-scripts.log -f [2013-07-11 15:51:54.423508] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'. [2013-07-11 15:51:54.423576] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-07-11 15:51:54.440124] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0 [2013-07-11 15:51:54.440660] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1 [2013-07-11 15:51:54.440886] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21 [2013-07-11 15:51:54.442235] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode [2013-07-11 15:51:54.443451] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-0 [2013-07-11 16:21:22.729423] I [fuse-bridge.c:4583:fuse_thread_proc] 0-fuse: unmounting /firewall-scripts [2013-07-11 16:21:22.730976] W [glusterfsd.c:970:cleanup_and_exit] (-->/usr/lib64/libc.so.6(clone+0x6d) [0x7f7a69fee13d] (-->/usr/lib64/libpthread.so.0(+0x33c1607c53) [0x7f7a6a684c53] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7f7a6b372e35]))) 0-: received signum (15), shutting down [2013-07-11 16:21:22.731040] I [fuse-bridge.c:5212:fini] 0-fuse: Unmounting '/firewall-scripts'. Blank space - mount -av below. [2013-07-11 16:39:36.625696] I [glusterfsd.c:1878:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0beta3 (/usr/sbin/glusterfs --volfile-id=/firewall-scripts --volfile-server=192.168.253.1 /firewall-scripts) [2013-07-11 16:39:36.640661] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled [2013-07-11 16:39:36.640800] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread [2013-07-11 16:39:36.672416] I [socket.c:3480:socket_init] 0-firewall-scripts-client-1: SSL support is NOT enabled [2013-07-11 16:39:36.672539] I [socket.c:3495:socket_init] 0-firewall-scripts-client-1: using system polling thread [2013-07-11 16:39:36.674545] I [socket.c:3480:socket_init] 0-firewall-scripts-client-0: SSL support is NOT enabled [2013-07-11 16:39:36.674667] I [socket.c:3495:socket_init] 0-firewall-scripts-client-0: using system polling thread [2013-07-11 16:39:36.675015] I [client.c:2154:notify] 0-firewall-scripts-client-0: parent translators are ready, attempting connect on transport [2013-07-11 16:39:36.686253] I [client.c:2154:notify] 0-firewall-scripts-client-1: parent translators are ready, attempting connect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume firewall-scripts-client-0 2: type protocol/client 3: option password fb3955b7-a6ca-49bb-b886-d4b6609392f8 4: option username de6eacd1-31bc-4bdb-a049-776cd840059e 5: option transport-type tcp 6: option remote-subvolume /gluster-fw1 7: option remote-host 192.168.253.1 8: end-volume 9: 10: volume firewall-scripts-client-1 11: type protocol/client 12: option password fb3955b7-a6ca-49bb-b886-d4b6609392f8 13: option username de6eacd1-31bc-4bdb-a049-776cd840059e 14: option transport-type tcp 15: option remote-subvolume /gluster-fw2 16: option remote-host 192.168.253.2 17: end-volume 18: 19: volume firewall-scripts-replicate-0 20: type cluster/replicate 21: subvolumes firewall-scripts-client-0 firewall-scripts-client-1 22: end-volume 23: 24: volume firewall-scripts-dht 25: type cluster/distribute 26: subvolumes firewall-scripts-replicate-0 27: end-volume 28: 29: volume firewall-scripts-write-behind 30: type performance/write-behind 31: subvolumes firewall-scripts-dht 32: end-volume 33: 34: volume firewall-scripts-read-ahead 35: type performance/read-ahead 36: subvolumes firewall-scripts-write-behind 37: end-volume 38: 39: volume firewall-scripts-io-cache 40: type performance/io-cache 41: subvolumes firewall-scripts-read-ahead 42: end-volume 43: 44: volume firewall-scripts-quick-read 45: type performance/quick-read 46: subvolumes firewall-scripts-io-cache 47: end-volume 48: 49: volume firewall-scripts-open-behind 50: type performance/open-behind 51: subvolumes firewall-scripts-quick-read 52: end-volume 53: 54: volume firewall-scripts-md-cache 55: type performance/md-cache 56: subvolumes firewall-scripts-open-behind 57: end-volume 58: 59: volume firewall-scripts 60: type debug/io-stats 61: option count-fop-hits off 62: option latency-measurement off 63: subvolumes firewall-scripts-md-cache 64: end-volume +------------------------------------------------------------------------------+ [2013-07-11 16:39:36.698740] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-0: changing port to 49152 (from 0) [2013-07-11 16:39:36.698974] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-11 16:39:36.711537] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-1: changing port to 49152 (from 0) [2013-07-11 16:39:36.711717] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-1: readv failed (No data available) [2013-07-11 16:39:36.723116] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-07-11 16:39:36.723521] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-07-11 16:39:36.723913] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-0: Connected to 192.168.253.1:49152, attached to remote volume '/gluster-fw1'. [2013-07-11 16:39:36.723995] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-07-11 16:39:36.724390] I [afr-common.c:3698:afr_notify] 0-firewall-scripts-replicate-0: Subvolume 'firewall-scripts-client-0' came back up; going online. [2013-07-11 16:39:36.724601] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-0: Server lk version = 1 [2013-07-11 16:39:36.724730] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'. [2013-07-11 16:39:36.724788] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-07-11 16:39:36.737359] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0 [2013-07-11 16:39:36.739297] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1 [2013-07-11 16:39:36.739486] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21 [2013-07-11 16:39:36.740672] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode [2013-07-11 16:39:36.741820] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-0 And from fw2: [root at chicago-fw2 ~]# tail /var/log/glusterfs/firewall-scripts.log -f [2013-07-11 15:51:45.499012] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-07-11 15:51:45.512667] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0 [2013-07-11 15:51:45.513211] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-0: Server lk version = 1 [2013-07-11 15:51:45.513416] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1 [2013-07-11 15:51:45.513538] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21 [2013-07-11 15:51:45.515208] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode [2013-07-11 15:51:45.516512] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-1 [2013-07-11 16:21:28.150710] I [fuse-bridge.c:4583:fuse_thread_proc] 0-fuse: unmounting /firewall-scripts [2013-07-11 16:21:28.154455] W [glusterfsd.c:970:cleanup_and_exit] (-->/usr/lib64/libc.so.6(clone+0x6d) [0x7fa599ad613d] (-->/usr/lib64/libpthread.so.0(+0x3c1b407c53) [0x7fa59a16cc53] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fa59ae5ae35]))) 0-: received signum (15), shutting down [2013-07-11 16:21:28.154503] I [fuse-bridge.c:5212:fini] 0-fuse: Unmounting '/firewall-scripts'. Blank space - this is where I did mount -av [2013-07-11 16:39:35.100584] I [glusterfsd.c:1878:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.4.0beta3 (/usr/sbin/glusterfs --volfile-id=/firewall-scripts --volfile-server=192.168.253.2 /firewall-scripts) [2013-07-11 16:39:35.113481] I [socket.c:3480:socket_init] 0-glusterfs: SSL support is NOT enabled [2013-07-11 16:39:35.113614] I [socket.c:3495:socket_init] 0-glusterfs: using system polling thread [2013-07-11 16:39:35.147118] I [socket.c:3480:socket_init] 0-firewall-scripts-client-1: SSL support is NOT enabled [2013-07-11 16:39:35.147313] I [socket.c:3495:socket_init] 0-firewall-scripts-client-1: using system polling thread [2013-07-11 16:39:35.149112] I [socket.c:3480:socket_init] 0-firewall-scripts-client-0: SSL support is NOT enabled [2013-07-11 16:39:35.149268] I [socket.c:3495:socket_init] 0-firewall-scripts-client-0: using system polling thread [2013-07-11 16:39:35.149390] I [client.c:2154:notify] 0-firewall-scripts-client-0: parent translators are ready, attempting connect on transport [2013-07-11 16:39:35.160491] I [client.c:2154:notify] 0-firewall-scripts-client-1: parent translators are ready, attempting connect on transport Given volfile: +------------------------------------------------------------------------------+ 1: volume firewall-scripts-client-0 2: type protocol/client 3: option password fb3955b7-a6ca-49bb-b886-d4b6609392f8 4: option username de6eacd1-31bc-4bdb-a049-776cd840059e 5: option transport-type tcp 6: option remote-subvolume /gluster-fw1 7: option remote-host 192.168.253.1 8: end-volume 9: 10: volume firewall-scripts-client-1 11: type protocol/client 12: option password fb3955b7-a6ca-49bb-b886-d4b6609392f8 13: option username de6eacd1-31bc-4bdb-a049-776cd840059e 14: option transport-type tcp 15: option remote-subvolume /gluster-fw2 16: option remote-host 192.168.253.2 17: end-volume 18: 19: volume firewall-scripts-replicate-0 20: type cluster/replicate 21: subvolumes firewall-scripts-client-0 firewall-scripts-client-1 22: end-volume 23: 24: volume firewall-scripts-dht 25: type cluster/distribute 26: subvolumes firewall-scripts-replicate-0 27: end-volume 28: 29: volume firewall-scripts-write-behind 30: type performance/write-behind 31: subvolumes firewall-scripts-dht 32: end-volume 33: 34: volume firewall-scripts-read-ahead 35: type performance/read-ahead 36: subvolumes firewall-scripts-write-behind 37: end-volume 38: 39: volume firewall-scripts-io-cache 40: type performance/io-cache 41: subvolumes firewall-scripts-read-ahead 42: end-volume 43: 44: volume firewall-scripts-quick-read 45: type performance/quick-read 46: subvolumes firewall-scripts-io-cache 47: end-volume 48: 49: volume firewall-scripts-open-behind 50: type performance/open-behind 51: subvolumes firewall-scripts-quick-read 52: end-volume 53: 54: volume firewall-scripts-md-cache 55: type performance/md-cache 56: subvolumes firewall-scripts-open-behind 57: end-volume 58: 59: volume firewall-scripts 60: type debug/io-stats 61: option count-fop-hits off 62: option latency-measurement off 63: subvolumes firewall-scripts-md-cache 64: end-volume +------------------------------------------------------------------------------+ [2013-07-11 16:39:35.173867] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-0: changing port to 49152 (from 0) [2013-07-11 16:39:35.174065] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-firewall-scripts-client-1: changing port to 49152 (from 0) [2013-07-11 16:39:35.174377] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-11 16:39:35.185807] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-1: readv failed (No data available) [2013-07-11 16:39:35.197485] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-07-11 16:39:35.197740] I [client-handshake.c:1658:select_server_supported_programs] 0-firewall-scripts-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-07-11 16:39:35.198257] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-0: Connected to 192.168.253.1:49152, attached to remote volume '/gluster-fw1'. [2013-07-11 16:39:35.198346] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-07-11 16:39:35.198546] I [afr-common.c:3698:afr_notify] 0-firewall-scripts-replicate-0: Subvolume 'firewall-scripts-client-0' came back up; going online. [2013-07-11 16:39:35.198759] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'. [2013-07-11 16:39:35.198810] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-07-11 16:39:35.211534] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0 [2013-07-11 16:39:35.211921] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1 [2013-07-11 16:39:35.212098] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-0: Server lk version = 1 [2013-07-11 16:39:35.212234] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21 [2013-07-11 16:39:35.213421] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode [2013-07-11 16:39:35.214372] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-1