Hello, I've got a basic glusterfs setup, one volume with a single brick on a single server (i'll call "server") accessed by two clients (i'll call them "client1" and "client2") all hooked up via normal ethernet. All the systems are running CentOS 6.3 and gluster 3.3 (via the RPM's provided on gluster.org). I have configured the clients to access the server using FUSE mounted via fstab using only default settings. The volume transport is tcp. One of the two clients, client1, works perfectly. It can mount the volume and access all the data reliably and without any issues. The second client, client2, has a lot of problems, sometimes the volume mounts correctly and everything works well until you reboot when it hangs on start up and eventually times out trying to mount the volume. I also see "interrupted system call" during bootup on client2 whether the mount succeeds or fails. I run a separate test setup with a very similar configuration, but with more clients and they all work fine. Anyway the server is configured to filter hosts via iptables and I have opened the necessary ports specifically for both hosts. Here's the excerpt from the iptables configuration: ACCEPT tcp -- [client1ip] 0.0.0.0/0 state NEW tcp dpts:24007:24010 ACCEPT tcp -- [client1ip] 0.0.0.0/0 state NEW tcp dpt:111 ACCEPT udp -- [client1ip] 0.0.0.0/0 state NEW udp dpt:111 ACCEPT tcp -- [client2ip] 0.0.0.0/0 state NEW tcp dpts:24007:24010 ACCEPT tcp -- [client2ip] 0.0.0.0/0 state NEW tcp dpt:111 ACCEPT udp -- [client2ip] 0.0.0.0/0 state NEW udp dpt:111 I have tried disabling the firewall completely and it appears to make no difference. I have the gluster IP access filter explicitly set to allow all: Volume Name: data-volume Type: Distribute Volume ID: c4140398-393d-414d-9062-d4ce26a90db6 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: server:/data Options Reconfigured: auth.allow: * As I mentioned, client1 is able to access everything without any issues but client2 will, about 99% of the time, fail to mount the volume at all. Very occasionally it will succeed and everything will work fine until you reboot the machine when it goes back to failing. Here is what I see in the log with full debug when I attempt a manual mount (edited to redact addresses) on client2: # mount.glusterfs server:data-volume -o log-level=TRACE /data Mount failed. Please check the log file for more details. # cat /var/log/glusterfs/data.log [2012-08-28 16:18:05.679911] I [glusterfsd.c:1666:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.3.0 [2012-08-28 16:18:05.679971] T [xlator.c:198:xlator_dynload] 0-xlator: attempt to load file /usr/lib64/glusterfs/3.3.0/xlator/mount/fuse.so [2012-08-28 16:18:05.680242] T [xlator.c:250:xlator_dynload] 0-xlator: dlsym(reconfigure) on /usr/lib64/glusterfs/3.3.0/xlator/mount/fuse.so: undefined symbol: reconfigure -- neglecting [2012-08-28 16:18:05.680274] D [glusterfsd.c:395:create_fuse_mount] 0-: fuse direct io type 2 [2012-08-28 16:18:05.681122] D [rpc-clnt.c:973:rpc_clnt_connection_init] 0-glusterfs: defaulting frame-timeout to 30mins [2012-08-28 16:18:05.681200] D [rpc-transport.c:248:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.3.0/rpc-transport/socket.so [2012-08-28 16:18:05.681517] T [options.c:77:xlator_option_validate_int] 0-glusterfs: no range check required for 'option remote-port 24007' [2012-08-28 16:18:05.681555] D [rpc-clnt.c:1379:rpcclnt_cbk_program_register] 0-glusterfs: New program registered: GlusterFS Callback, Num: 52743234, Ver: 1 [2012-08-28 16:18:05.681572] T [rpc-clnt.c:429:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect [2012-08-28 16:18:05.681598] T [common-utils.c:111:gf_resolve_ip6] 0-resolver: DNS cache not present, freshly probing hostname: server [2012-08-28 16:18:05.763206] D [common-utils.c:151:gf_resolve_ip6] 0-resolver: returning ip-XXX.XXX.XXX.XXX (port-24007) for hostname: server and port: 24007 [2012-08-28 16:18:05.768441] T [socket.c:370:__socket_nodelay] 0-glusterfs: NODELAY enabled for socket 8 [2012-08-28 16:18:05.768473] T [socket.c:424:__socket_keepalive] 0-glusterfs: Keep-alive enabled for socket 8, interval 2, idle: 20 [2012-08-28 16:18:08.769439] T [rpc-clnt.c:429:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect [2012-08-28 16:18:08.769737] T [socket.c:2003:socket_connect] (-->/lib64/libpthread.so.0() [0x3817e07851] (-->/usr/lib64/libglusterfs.so.0(gf_timer_proc+0xd0) [0x3c9e22a880] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0x99) [0x3c9de0e999]))) 0-glusterfs: connect () called on transport already connected [2012-08-28 16:18:11.770136] T [rpc-clnt.c:429:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect ... (reconnects repeat a total of 22 times all with the same "transport already connected" message) ... [2012-08-28 16:19:08.778012] T [rpc-clnt.c:429:rpc_clnt_reconnect] 0-glusterfs: attempting reconnect [2012-08-28 16:19:08.778295] T [socket.c:2003:socket_connect] (-->/lib64/libpthread.so.0() [0x3817e07851] (-->/usr/lib64/libglusterfs.so.0(gf_timer_proc+0xd0) [0x3c9e22a880] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_reconnect+0x99) [0x3c9de0e999]))) 0-glusterfs: connect () called on transport already connected [2012-08-28 16:19:08.881543] E [socket.c:1715:socket_connect_finish] 0-glusterfs: connection to failed (Connection timed out) [2012-08-28 16:19:08.881602] D [socket.c:280:__socket_disconnect] 0-glusterfs: shutdown() returned -1. Transport endpoint is not connected [2012-08-28 16:19:08.881629] T [rpc-clnt.c:535:rpc_clnt_connection_cleanup] 0-glusterfs: cleaning up state in transport object 0x182b760 [2012-08-28 16:19:08.881653] E [glusterfsd-mgmt.c:1783:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: Transport endpoint is not connected [2012-08-28 16:19:08.881672] I [glusterfsd-mgmt.c:1786:mgmt_rpc_notify] 0-glusterfsd-mgmt: -1 connect attempts left [2012-08-28 16:19:08.881739] W [glusterfsd.c:831:cleanup_and_exit] (-->/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28) [0x3c9de0b018] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xc0) [0x3c9de0f830] (-->/usr/sbin/glusterfs() [0x40c026]))) 0-: received signum (1), shutting down [2012-08-28 16:19:08.881776] D [glusterfsd-mgmt.c:2154:glusterfs_mgmt_pmap_signout] 0-fsd-mgmt: portmapper signout arguments not given [2012-08-28 16:19:08.881803] I [fuse-bridge.c:4643:fini] 0-fuse: Unmounting '/data'. The server typically logs nothing when I make these attempts, but occasionally will log something like this: # cat /var/log/glusterfs/bricks/data.log [2012-08-26 03:17:01.378270] I [glusterfsd-mgmt.c:1565:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2012-08-28 13:25:08.231095] I [server-handshake.c:571:server_setvolume] 0-data-volume-server: accepted client from client2-2896-2012/08/28-13:25:04:222124-data-volume-client-0-0 (version: 3.3.0) [2012-08-28 13:27:22.860474] W [socket.c:195:__socket_rwv] 0-tcp.data-volume-server: readv failed (Connection timed out) [2012-08-28 13:27:22.860555] I [server.c:685:server_rpc_notify] 0-data-volume-server: disconnecting connectionfrom client2-2896-2012/08/28-13:25:04:222124-data-volume-client-0-0 [2012-08-28 13:27:22.860577] I [server-helpers.c:741:server_connection_put] 0-data-volume-server: Shutting down connection client2-2896-2012/08/28-13:25:04:222124-data-volume-client-0-0 [2012-08-28 13:27:22.860602] I [server-helpers.c:629:server_connection_destroy] 0-data-volume-server: destroyed connection of client2-2896-2012/08/28-13:25:04:222124-data-volume-client-0-0 [2012-08-28 13:27:35.079569] I [server-handshake.c:571:server_setvolume] 0-data-volume-server: accepted client from client2-2867-2012/08/28-13:27:31:164208-data-volume-client-0-0 (version: 3.3.0) [2012-08-28 13:35:02.540383] W [socket.c:195:__socket_rwv] 0-tcp.data-volume-server: readv failed (Connection timed out) [2012-08-28 13:35:02.540462] I [server.c:685:server_rpc_notify] 0-data-volume-server: disconnecting connectionfrom client2-2867-2012/08/28-13:27:31:164208-data-volume-client-0-0 [2012-08-28 13:35:02.540486] I [server-helpers.c:741:server_connection_put] 0-data-volume-server: Shutting down connection client2-2867-2012/08/28-13:27:31:164208-data-volume-client-0-0 [2012-08-28 13:35:02.540511] I [server-helpers.c:629:server_connection_destroy] 0-data-volume-server: destroyed connection of client2-2867-2012/08/28-13:27:31:164208-data-volume-client-0-0 [2012-08-28 13:35:19.171448] I [server-handshake.c:571:server_setvolume] 0-data-volume-server: accepted client from client2-2857-2012/08/28-13:35:10:386705-data-volume-client-0-0 (version: 3.3.0) [2012-08-28 13:41:36.513626] W [socket.c:195:__socket_rwv] 0-tcp.data-volume-server: readv failed (Connection timed out) [2012-08-28 13:41:36.513707] I [server.c:685:server_rpc_notify] 0-data-volume-server: disconnecting connectionfrom client2-2857-2012/08/28-13:35:10:386705-data-volume-client-0-0 [2012-08-28 13:41:36.513726] I [server-helpers.c:741:server_connection_put] 0-data-volume-server: Shutting down connection client2-2857-2012/08/28-13:35:10:386705-data-volume-client-0-0 [2012-08-28 13:41:36.513752] I [server-helpers.c:629:server_connection_destroy] 0-data-volume-server: destroyed connection of client2-2857-2012/08/28-13:35:10:386705-data-volume-client-0-0 I don't know if these readv failures are actually related to the problem though. Thanks, Leo -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20120903/4f06a6f9/attachment.htm>