Brian, I'm not ready to give up just yet. >From Rejy: > Would not the mount option 'backupvolfile-server=<secondary server> help > at mount time, in the case of the primary server not being available ? Hmmm - this seems to be a step in the right direction. On both nodes I did: umount /firewall-scripts Then on fw1: [root at chicago-fw1 gregs]# mount -t glusterfs -o backupvolfile-server=192.168.253.2 192.168.253.1:/firewall-scripts /firewall-scripts And on fw2: [root at chicago-fw2 ~]# mount -t glusterfs -o backupvolfile-server=192.168.253.1 192.168.253.2:/firewall-scripts /firewall-scripts For the test I just ran, each node still uses its local copy first. For my application, I'm not super concerned about conflicts between one directory and the other because my /firewall-scripts directory will be read-mostly when this is in production. And as part of my startup, the node with the lowest IP Address takes itself offline for a few seconds so the other node detects it's down and can assume the primary role. That's what put me on to this Gluster behavior in the first place - fw2 could not find its script to take control even though a copy of it was sitting right there on its local disk. Anyway, this time with the file system mounted as above, I took fw1 offline and from fw2 did, "ls /firewall-scripts". This time fw2 waited several seconds and then showed me the directory listing instead of blowing up with an error. Which seems strange to me since I told fw2 that fw1 is its backupvolfile-server and fw1 went offline. So the behavior is definitely not intuitive. One other detail that may be relevant - I take fw1 offline by inserting a firewall rule that does a REJECT on that interface. That probably explains the "Connection refused" message in the log extract below. I can try a different test, changing the rule to DROP so it really really is offline and see what happens. The log on fw2 looks a little different this time. This tail was taken after doing an ls from fw2. Pranith - is this the log you mean? If so, I can do the tests again and keep a tail -f in a different window when the other node goes offline, so we catch the messages right at that event. Will this be helpful? I can send tarballs of the whole log file, but it's huge and finding the key messages seems like a needle in a haystack. [root at chicago-fw2 ~]# tail /var/log/glusterfs/firewall-scripts.log -f [2013-07-10 10:37:59.446481] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (Connection reset by peer) [2013-07-10 10:37:59.446558] W [socket.c:1962:__socket_proto_state_machine] 0-firewall-scripts-client-0: reading from socket failed. Error (Connection reset by peer), peer (192.168.253.1:49152) [2013-07-10 10:37:59.447322] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x48) [0x7f8974409b78] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb8) [0x7f8974408028] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f8974407f4e]))) 0-firewall-scripts-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2013-07-10 10:37:33.563280 (xid=0x24x) [2013-07-10 10:37:59.447378] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-firewall-scripts-client-0: remote operation failed: Transport endpoint is not connected. Path: / (00000000-0000-0000-0000-000000000001) [2013-07-10 10:37:59.447716] E [rpc-clnt.c:368:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x48) [0x7f8974409b78] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb8) [0x7f8974408028] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f8974407f4e]))) 0-firewall-scripts-client-0: forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2013-07-10 10:37:35.949434 (xid=0x25x) [2013-07-10 10:37:59.447754] W [client-handshake.c:276:client_ping_cbk] 0-firewall-scripts-client-0: timer must have expired [2013-07-10 10:37:59.447821] I [client.c:2097:client_rpc_notify] 0-firewall-scripts-client-0: disconnected [2013-07-10 10:38:09.963388] E [socket.c:2157:socket_connect_finish] 0-firewall-scripts-client-0: connection to 192.168.253.1:24007 failed (Connection refused) [2013-07-10 10:38:09.963493] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:38:19.988428] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:38:53.044399] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:38:54.999683] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:38:58.010774] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:04.028362] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:07.033038] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:10.044094] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:16.060406] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:19.066521] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:22.077600] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:25.088684] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:28.099805] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:31.110840] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:34.121921] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:37.133003] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:40.144084] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:43.155168] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:46.166228] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:49.177270] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:52.188359] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) [2013-07-10 10:39:55.199451] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-0: readv failed (No data available) ^C And the log from fw1 looks like this: [root at chicago-fw1 gregs]# tail /var/log/glusterfs/firewall-scripts.log -f [2013-07-10 10:36:19.708342] I [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to remote volume '/gluster-fw2'. [2013-07-10 10:36:19.708372] I [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-07-10 10:36:19.720679] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: switched to graph 0 [2013-07-10 10:36:19.721049] I [client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: Server lk version = 1 [2013-07-10 10:36:19.721291] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21 [2013-07-10 10:36:19.722390] I [afr-common.c:2057:afr_set_root_inode_on_first_lookup] 0-firewall-scripts-replicate-0: added root inode [2013-07-10 10:36:19.723259] I [afr-common.c:2120:afr_discovery_cbk] 0-firewall-scripts-replicate-0: selecting local read_child firewall-scripts-client-0 [2013-07-10 10:37:47.242308] W [socket.c:514:__socket_rwv] 0-firewall-scripts-client-1: readv failed (Connection timed out) [2013-07-10 10:37:47.242385] W [socket.c:1962:__socket_proto_state_machine] 0-firewall-scripts-client-1: reading from socket failed. Error (Connection timed out), peer (192.168.253.2:49152) [2013-07-10 10:37:47.242462] I [client.c:2097:client_rpc_notify] 0-firewall-scripts-client-1: disconnected ^C