I upgrade my systems to 3.6.3 and some of my clients are now having issues connecting. I can mount using NFS without any issues. However, when I try to FUSE mount, it times out on many of my nodes. It mounted to approximately 400-nodes. However, the remainder timed out. Any suggestions for how to fix?
On the client side, I am getting the following in the logs:
[2015-05-05 00:17:18.013319] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.3
(args: /usr/sbin/glusterfs --volfile-server=gfsib01a.corvidtec.com --volfile-server-transport=tcp --volfile-id=/homegfs.tcp /homegfs_test)
[2015-05-05 00:18:21.019012] E [socket.c:2276:socket_connect_finish] 0-glusterfs: connection to 10.1.70.1:24007 failed (Connection timed out)
[2015-05-05 00:18:21.019092] E [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: gfsib01a.corvidtec
.com (Transport endpoint is not connected)
[2015-05-05 00:18:21.019100] I [glusterfsd-mgmt.c:1817:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers
[2015-05-05 00:18:21.019224] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (1), shutting down
[2015-05-05 00:18:21.019239] I [fuse-bridge.c:5599:fini] 0-fuse: Unmounting '/homegfs_test'.
[2015-05-05 00:18:21.027770] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (15), shutting down
Logs from my server are attached...
[root@gfs01a log]# gluster volume status homegfs
Status of volume: homegfs
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick gfsib01a.corvidtec.com:/data/brick01a/homegfs 49152 Y 3816
Brick gfsib01b.corvidtec.com:/data/brick01b/homegfs 49152 Y 3826
Brick gfsib01a.corvidtec.com:/data/brick02a/homegfs 49153 Y 3821
Brick gfsib01b.corvidtec.com:/data/brick02b/homegfs 49153 Y 3831
Brick gfsib02a.corvidtec.com:/data/brick01a/homegfs 49152 Y 3959
Brick gfsib02b.corvidtec.com:/data/brick01b/homegfs 49152 Y 3970
Brick gfsib02a.corvidtec.com:/data/brick02a/homegfs 49153 Y 3964
Brick gfsib02b.corvidtec.com:/data/brick02b/homegfs 49153 Y 3975
NFS Server on localhost 2049 Y 3830
Self-heal Daemon on localhost N/A Y 3835
NFS Server on gfsib01b.corvidtec.com 2049 Y 3840
Self-heal Daemon on gfsib01b.corvidtec.com N/A Y 3845
NFS Server on gfsib02b.corvidtec.com 2049 Y 3984
Self-heal Daemon on gfsib02b.corvidtec.com N/A Y 3989
NFS Server on gfsib02a.corvidtec.com 2049 Y 3973
Self-heal Daemon on gfsib02a.corvidtec.com N/A Y 3978
Task Status of Volume homegfs
------------------------------------------------------------------------------
Task : Rebalance
ID : 58b6cc76-c29c-4695-93fe-c42b1112e171
Status : completed
[root@gfs01a log]# gluster volume info homegfs
Volume Name: homegfs
Type: Distributed-Replicate
Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
Options Reconfigured:
server.manage-gids: on
changelog.rollover-time: 15
changelog.fsync-interval: 3
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: off
storage.owner-gid: 100
network.ping-timeout: 10
server.allow-insecure: on
performance.write-behind-window-size: 128MB
performance.cache-size: 128MB
performance.io-thread-count: 32
David