I have some connectivity errors with GlusterFS mount points I can't get solved. We have a pretty basis setup with two Gluster bricks and a bunch of clients (all 3.3.2). Very occasionally we have a brief network outages and some Gluster mounts points get unavailable. The other Gluster mounts on other servers to the same bricks have no problems. The console on client shows: mountall: Plymouth command failed mountall: Disconnected from Plymouth mountall: Event failed mountall: Skipping mounting /home since Plymouth is not available Manual mount gives: $ sudo mount /home unknown option _netdev (ignored) ERROR: Mount point does not exist. Usage: mount.glusterfs <volumeserver>:<volumeid/volumeport> -o <options> <mount point> On the client, I can see a few hung connections (lsof | grep TCP shows stuck on SYN_SENT, source port 24010 on client). Also the connection tracker of iptables seem to have issues: Nov 22 09:28:36 app16 kernel: [3180197.360596] [INPUT] dropped IN=eth0 OUT= MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 WINDOW=14480 RES=0x00 ACK SYN URGP=0 Nov 22 09:28:37 app16 kernel: [3180198.156075] [INPUT] dropped IN=eth0 OUT= MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 WINDOW=14480 RES=0x00 ACK SYN URGP=0 Nov 22 09:28:44 app16 kernel: [3180205.377404] [INPUT] dropped IN=eth0 OUT= MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 WINDOW=14480 RES=0x00 ACK SYN URGP=0 Nov 22 09:28:45 app16 kernel: [3180206.160003] [INPUT] dropped IN=eth0 OUT= MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 WINDOW=14480 RES=0x00 ACK SYN URGP=0 Nov 22 09:29:00 app16 kernel: [3180221.410958] [INPUT] dropped IN=eth0 OUT= MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 WINDOW=14480 RES=0x00 ACK SYN URGP=0 Nov 22 09:29:00 app16 kernel: [3180222.154831] [INPUT] dropped IN=eth0 OUT= MAC=aa:01:60:00:90:4c:aa:01:60:00:87:cb:08:00 SRC=10.243.0.24 DST=10.243.0.76 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=24010 DPT=1021 WINDOW=14480 RES=0x00 ACK SYN URGP=0 Work around is to manuallly umount and mount the failed shares. No more SYN_SENT connections in lsof and the share is accessible again. But what is the cause of this? We need the shares to be available any time, especially after network recovers. That's the whole point of distributed file systems... Some background info. /etc/fstab contains: file1.cluster.peercode.nl:GLUSTER-HOME /home glusterfs nobootwait,backupvolfile-server=file2.cluster.peercode.nl 0 0 This is the log of brick 10.243.0.76 during a short network hickup: [2013-11-21 21:57:07.877100] W [client3_1-fops.c:647:client3_1_unlink_cbk] 0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory [2013-11-21 22:07:07.984100] W [client3_1-fops.c:647:client3_1_unlink_cbk] 0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory [2013-11-21 22:17:08.093102] W [client3_1-fops.c:647:client3_1_unlink_cbk] 0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory [2013-11-21 22:25:53.475072] W [socket.c:195:__socket_rwv] 0-GLUSTER-HOME-client-1: readv failed (Connection reset by peer) [2013-11-21 22:25:53.475149] W [socket.c:1512:__socket_proto_state_machine] 0-GLUSTER-HOME-client-1: reading from socket failed. Error (Connection reset by peer), peer (10.243.0.24:24009) [2013-11-21 22:25:53.492487] I [client.c:2090:client_rpc_notify] 0-GLUSTER-HOME-client-1: disconnected [2013-11-21 22:25:54.536414] W [socket.c:195:__socket_rwv] 0-GLUSTER-SHARE-client-1: readv failed (Connection reset by peer) [2013-11-21 22:25:54.536454] W [socket.c:1512:__socket_proto_state_machine] 0-GLUSTER-SHARE-client-1: reading from socket failed. Error (Connection reset by peer), peer (10.243.0.24:24010) [2013-11-21 22:25:54.536503] I [client.c:2090:client_rpc_notify] 0-GLUSTER-SHARE-client-1: disconnected [2013-11-21 22:26:03.539704] I [client-handshake.c:1614:select_server_supported_programs] 0-GLUSTER-HOME-client-1: Using Program GlusterFS 3.3.1, Num (1298437), Version (330) [2013-11-21 22:26:03.541640] I [client-handshake.c:1411:client_setvolume_cbk] 0-GLUSTER-HOME-client-1: Connected to 10.243.0.24:24009, attached to remote volume '/data/export-home-2'. [2013-11-21 22:26:03.541668] I [client-handshake.c:1423:client_setvolume_cbk] 0-GLUSTER-HOME-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-11-21 22:26:03.548534] I [client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-HOME-client-1: Server lk version = 1 [2013-11-21 22:26:05.536563] I [client-handshake.c:1614:select_server_supported_programs] 0-GLUSTER-SHARE-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-11-21 22:26:05.537510] I [client-handshake.c:1411:client_setvolume_cbk] 0-GLUSTER-SHARE-client-1: Connected to 10.243.0.24:24010, attached to remote volume '/data/export-share-2'. [2013-11-21 22:26:05.537530] I [client-handshake.c:1423:client_setvolume_cbk] 0-GLUSTER-SHARE-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-11-21 22:26:05.541133] I [client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-SHARE-client-1: Server lk version = 1 [2013-11-21 22:27:08.549143] W [client3_1-fops.c:647:client3_1_unlink_cbk] 0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory [2013-11-21 22:37:08.655387] W [client3_1-fops.c:647:client3_1_unlink_cbk] 0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory [2013-11-21 22:47:05.551891] W [socket.c:195:__socket_rwv] 0-GLUSTER-SHARE-client-1: readv failed (Connection timed out) [2013-11-21 22:47:05.551961] W [socket.c:1512:__socket_proto_state_machine] 0-GLUSTER-SHARE-client-1: reading from socket failed. Error (Connection timed out), peer (10.243.0.24:24010) [2013-11-21 22:47:05.552011] I [client.c:2090:client_rpc_notify] 0-GLUSTER-SHARE-client-1: disconnected [2013-11-21 22:47:07.599889] W [socket.c:195:__socket_rwv] 0-GLUSTER-HOME-client-1: readv failed (Connection timed out) [2013-11-21 22:47:07.599956] W [socket.c:1512:__socket_proto_state_machine] 0-GLUSTER-HOME-client-1: reading from socket failed. Error (Connection timed out), peer (10.243.0.24:24009) [2013-11-21 22:47:07.600008] I [client.c:2090:client_rpc_notify] 0-GLUSTER-HOME-client-1: disconnected [2013-11-21 22:47:08.761366] E [afr-self-heald.c:418:_crawl_proceed] 0-GLUSTER-SHARE-replicate-0: Stopping crawl as < 2 children are up [2013-11-21 22:47:08.764653] E [afr-self-heald.c:418:_crawl_proceed] 0-GLUSTER-HOME-replicate-0: Stopping crawl as < 2 children are up [2013-11-21 22:47:18.759922] E [socket.c:1715:socket_connect_finish] 0-GLUSTER-HOME-client-1: connection to 10.243.0.24:24009 failed (No route to host) [2013-11-21 22:48:18.907865] E [socket.c:1715:socket_connect_finish] 0-GLUSTER-SHARE-client-1: connection to 10.243.0.24:24010 failed (Connection timed out) [2013-11-21 22:49:50.825110] I [client-handshake.c:1614:select_server_supported_programs] 0-GLUSTER-HOME-client-1: Using Program GlusterFS 3.3.1, Num (1298437), Version (330) [2013-11-21 22:49:50.825887] I [client-handshake.c:1411:client_setvolume_cbk] 0-GLUSTER-HOME-client-1: Connected to 10.243.0.24:24009, attached to remote volume '/data/export-home-2'. [2013-11-21 22:49:50.825906] I [client-handshake.c:1423:client_setvolume_cbk] 0-GLUSTER-HOME-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-11-21 22:49:50.826525] I [client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-HOME-client-1: Server lk version = 1 [2013-11-21 22:49:52.863320] I [client-handshake.c:1614:select_server_supported_programs] 0-GLUSTER-SHARE-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-11-21 22:49:52.864061] I [client-handshake.c:1411:client_setvolume_cbk] 0-GLUSTER-SHARE-client-1: Connected to 10.243.0.24:24010, attached to remote volume '/data/export-share-2'. [2013-11-21 22:49:52.864089] I [client-handshake.c:1423:client_setvolume_cbk] 0-GLUSTER-SHARE-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-11-21 22:49:52.864841] I [client-handshake.c:453:client_set_lk_version_cbk] 0-GLUSTER-SHARE-client-1: Server lk version = 1 [2013-11-21 22:57:08.913844] W [client3_1-fops.c:647:client3_1_unlink_cbk] 0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory [2013-11-21 23:07:09.033899] W [client3_1-fops.c:647:client3_1_unlink_cbk] 0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory [2013-11-21 23:17:09.160547] W [client3_1-fops.c:647:client3_1_unlink_cbk] 0-GLUSTER-HOME-client-0: remote operation failed: No such file or directory