Gluster Community, I'm having a terrible time just trying to get started with gluster. I'm running Centos 5.7 on a few nodes, and have installed gluster 3.2.4 and its prereqs from RPMs. Yet I'm finding it impossible to create a simple 2-brick distributed volume. I keep seeing this error a lot: reading from socket failed. Error (Transport endpoint is not connected) referring to both the localhost and peers. There is no iptables running on any of these machines, and all machines can ssh to each other and report that their peers are connected. I've googled this and other errors I've seen, and many results point into this site, but none of the suggestions I've read have helped me. The glusterfsd's are running. The peers are connected. I've done multiple reboots and restarts of daemons. This is a fresh install. Details are listed below. Can someone please help me out? Thanks! -Mark Sullivan Diviner Lunar Radiometer Experiment ========================================================================================================== ========================================================================================================== ========================================================================================================== On gluster03, creating a volume "glue" which is comprised of gluster03:/g1 and gluster04:/g1 gluster volume create glue transport tcp gluster03:/g1 gluster04:/g1 gluster volume set glue auth.allow 10.* gluster volume start glue The "etc*" log files show this: [2011-11-13 16:10:22.429786] I [glusterd-handler.c:900:glusterd_handle_create_volume] 0-glusterd: Received create volume req [2011-11-13 16:10:22.430303] I [glusterd-utils.c:243:glusterd_lock] 0-glusterd: Cluster lock held by fb1f46cf-a03a-4fcd-b103-735040af3ced [2011-11-13 16:10:22.430330] I [glusterd-handler.c:420:glusterd_op_txn_begin] 0-glusterd: Acquired local lock [2011-11-13 16:10:22.430777] I [glusterd-rpc-ops.c:752:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:22.431182] I [glusterd-op-sm.c:6543:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 1 peers [2011-11-13 16:10:22.431814] I [glusterd-rpc-ops.c:1050:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:22.470773] I [glusterd-op-sm.c:6660:glusterd_op_ac_send_commit_op] 0-glusterd: Sent op req to 1 peers [2011-11-13 16:10:22.489143] I [glusterd-rpc-ops.c:1236:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:22.489566] I [glusterd-rpc-ops.c:811:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:22.489604] I [glusterd-op-sm.c:7077:glusterd_op_txn_complete] 0-glusterd: Cleared local lock [2011-11-13 16:10:22.492971] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1023) [2011-11-13 16:10:22.611682] I [glusterd-utils.c:243:glusterd_lock] 0-glusterd: Cluster lock held by fb1f46cf-a03a-4fcd-b103-735040af3ced [2011-11-13 16:10:22.611709] I [glusterd-handler.c:420:glusterd_op_txn_begin] 0-glusterd: Acquired local lock [2011-11-13 16:10:22.612096] I [glusterd-rpc-ops.c:752:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:22.896543] I [glusterd-op-sm.c:6543:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 1 peers [2011-11-13 16:10:23.55185] I [glusterd-rpc-ops.c:1050:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:23.64798] I [glusterd-op-sm.c:6660:glusterd_op_ac_send_commit_op] 0-glusterd: Sent op req to 1 peers [2011-11-13 16:10:23.74209] I [glusterd-rpc-ops.c:1236:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:23.74527] I [glusterd-rpc-ops.c:811:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:23.74558] I [glusterd-op-sm.c:7077:glusterd_op_txn_complete] 0-glusterd: Cleared local lock [2011-11-13 16:10:23.79190] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1020) [2011-11-13 16:10:23.198846] I [glusterd-handler.c:1078:glusterd_handle_cli_start_volume] 0-glusterd: Received start vol reqfor volume glue [2011-11-13 16:10:23.198913] I [glusterd-utils.c:243:glusterd_lock] 0-glusterd: Cluster lock held by fb1f46cf-a03a-4fcd-b103-735040af3ced [2011-11-13 16:10:23.198938] I [glusterd-handler.c:420:glusterd_op_txn_begin] 0-glusterd: Acquired local lock [2011-11-13 16:10:23.199364] I [glusterd-rpc-ops.c:752:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:23.199819] I [glusterd-op-sm.c:6543:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 1 peers [2011-11-13 16:10:23.200396] I [glusterd-rpc-ops.c:1050:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:23.724138] I [glusterd-utils.c:1095:glusterd_volume_start_glusterfs] 0-: About to start glusterfs for brick gluster03:/g1 [2011-11-13 16:10:23.989454] I [glusterd-op-sm.c:6660:glusterd_op_ac_send_commit_op] 0-glusterd: Sent op req to 1 peers [2011-11-13 16:10:24.7044] I [glusterd-pmap.c:237:pmap_registry_bind] 0-pmap: adding brick /g1 on port 24009 [2011-11-13 16:10:24.39658] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1017) [2011-11-13 16:10:24.816411] I [glusterd-rpc-ops.c:1236:glusterd3_1_commit_op_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:24.816940] I [glusterd-rpc-ops.c:811:glusterd3_1_cluster_unlock_cbk] 0-glusterd: Received ACC from uuid: 7c9ee90c-91a5-45c0-aaf9-8b8a7347b67d [2011-11-13 16:10:24.816993] I [glusterd-op-sm.c:7077:glusterd_op_txn_complete] 0-glusterd: Cleared local lock [2011-11-13 16:10:24.818726] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1019) [2011-11-13 16:10:24.859565] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.24:1019) ========================================================================================================== My volume info looks okay, I guess... gluster volume info Volume Name: glue Type: Distribute Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster03:/g1 Brick2: gluster04:/g1 Options Reconfigured: auth.allow: 10.* When I mount the volume "glue" on gluster03 using "mount -t nfs gluster03:/glue /mnt", the nfs.log shows: [2011-11-13 16:18:06.83447] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:18:06.83507] I [dht-common.c:478:dht_revalidate_cbk] 0-glue-dht: subvolume glue-client-0 for / returned -1 (Invalid argument) [2011-11-13 16:18:06.84676] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:18:06.84704] I [dht-common.c:478:dht_revalidate_cbk] 0-glue-dht: subvolume glue-client-0 for / returned -1 (Invalid argument) [2011-11-13 16:18:06.85687] W [rpc-common.c:64:xdr_to_generic] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d) [0x2ae52ccad6fd] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x2ae52ccad502] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/protocol/client.so(client3_1_stat_cbk+0x91) [0x2aaaaaacccb1]))) 0-xdr: XDR decoding failed [2011-11-13 16:18:06.85723] E [client3_1-fops.c:398:client3_1_stat_cbk] 0-glue-client-0: error [2011-11-13 16:18:06.85748] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:18:06.86273] W [rpc-common.c:64:xdr_to_generic] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d) [0x2ae52ccad6fd] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x2ae52ccad502] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/protocol/client.so(client3_1_stat_cbk+0x91) [0x2aaaaaacccb1]))) 0-xdr: XDR decoding failed [2011-11-13 16:18:06.86301] E [client3_1-fops.c:398:client3_1_stat_cbk] 0-glue-client-0: error [2011-11-13 16:18:06.86324] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-glue-client-0: remote operation failed: Invalid argument ========================================================================================================== When I do "touch /mnt/new", I get "No such file or directory", and nfs.log shows: [2011-11-13 16:18:06.83447] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:18:06.83507] I [dht-common.c:478:dht_revalidate_cbk] 0-glue-dht: subvolume glue-client-0 for / returned -1 (Invalid argument) [2011-11-13 16:18:06.84676] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:18:06.84704] I [dht-common.c:478:dht_revalidate_cbk] 0-glue-dht: subvolume glue-client-0 for / returned -1 (Invalid argument) [2011-11-13 16:18:06.85687] W [rpc-common.c:64:xdr_to_generic] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d) [0x2ae52ccad6fd] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x2ae52ccad502] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/protocol/client.so(client3_1_stat_cbk+0x91) [0x2aaaaaacccb1]))) 0-xdr: XDR decoding failed [2011-11-13 16:18:06.85723] E [client3_1-fops.c:398:client3_1_stat_cbk] 0-glue-client-0: error [2011-11-13 16:18:06.85748] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:18:06.86273] W [rpc-common.c:64:xdr_to_generic] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d) [0x2ae52ccad6fd] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x2ae52ccad502] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/protocol/client.so(client3_1_stat_cbk+0x91) [0x2aaaaaacccb1]))) 0-xdr: XDR decoding failed [2011-11-13 16:18:06.86301] E [client3_1-fops.c:398:client3_1_stat_cbk] 0-glue-client-0: error [2011-11-13 16:18:06.86324] I [client3_1-fops.c:411:client3_1_stat_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:19:48.424842] I [dht-layout.c:192:dht_layout_search] 0-glue-dht: no subvolume for hash (value) = 1407928635 [2011-11-13 16:19:48.425129] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:19:48.425751] I [dht-layout.c:192:dht_layout_search] 0-glue-dht: no subvolume for hash (value) = 1407928635 [2011-11-13 16:19:48.425991] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:19:48.449516] I [dht-layout.c:192:dht_layout_search] 0-glue-dht: no subvolume for hash (value) = 1407928635 [2011-11-13 16:19:48.449662] E [fd.c:465:fd_unref] (-->/opt/glusterfs/3.2.4/lib64/libglusterfs.so.0(default_create_cbk+0xb4) [0x2ae52ca65cc4] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/debug/io-stats.so(io_stats_create_cbk+0x20c) [0x2aaaab76263c] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/nfs/server.so(nfs_fop_create_cbk+0x73) [0x2aaaab988a13]))) 0-fd: fd is NULL [2011-11-13 16:19:48.449859] W [rpc-common.c:64:xdr_to_generic] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d) [0x2ae52ccad6fd] (-->/opt/glusterfs/3.2.4/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2) [0x2ae52ccad502] (-->/opt/glusterfs/3.2.4/lib64/glusterfs/3.2.4/xlator/protocol/client.so(client3_1_statfs_cbk+0x7e) [0x2aaaaaac806e]))) 0-xdr: XDR decoding failed [2011-11-13 16:19:48.449888] E [client3_1-fops.c:624:client3_1_statfs_cbk] 0-glue-client-0: error [2011-11-13 16:19:48.449912] I [client3_1-fops.c:637:client3_1_statfs_cbk] 0-glue-client-0: remote operation failed: Invalid argument [2011-11-13 16:19:48.450030] I [dht-layout.c:192:dht_layout_search] 0-glue-dht: no subvolume for hash (value) = 1407928635 [2011-11-13 16:19:48.450260] I [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-glue-client-0: remote operation failed: Invalid argument ========================================================================================================== And from the brick log g1.log, in case this helps: [2011-11-13 21:46:05.929654] I [glusterfsd.c:1493:main] 0-/opt/glusterfs/3.2.4/sbin/glusterfsd: Started Running /opt/glusterfs/3.2.4/sbin/glusterfsd version 3.2.4 [2011-11-13 21:46:05.946509] W [socket.c:419:__socket_keepalive] 0-socket: failed to set keep idle on socket 8 [2011-11-13 21:46:05.946618] W [socket.c:1846:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported [2011-11-13 21:46:06.72770] W [graph.c:291:gf_add_cmdline_options] 0-glue-server: adding option 'listen-port' for volume 'glue-server' with value '24010' [2011-11-13 21:46:06.73873] W [rpc-transport.c:447:validate_volume_options] 0-tcp.glue-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction [2011-11-13 21:46:06.74204] W [posix.c:4686:init] 0-glue-posix: Posix access control list is not supported. Given volfile: +------------------------------------------------------------------------------+ 1: volume glue-posix 2: type storage/posix 3: option directory /g1 4: end-volume 5: 6: volume glue-access-control 7: type features/access-control 8: subvolumes glue-posix 9: end-volume 10: 11: volume glue-locks 12: type features/locks 13: subvolumes glue-access-control 14: end-volume 15: 16: volume glue-io-threads 17: type performance/io-threads 18: subvolumes glue-locks 19: end-volume 20: 21: volume glue-marker 22: type features/marker 23: option volume-uuid 2b567c80-ab30-44b2-9b17-e67e6e679096 24: option timestamp-file /etc/glusterd/vols/glue/marker.tstamp 25: option xtime off 26: option quota off 27: subvolumes glue-io-threads 28: end-volume 29: 30: volume /g1 31: type debug/io-stats 32: option latency-measurement off 33: option count-fop-hits off 34: subvolumes glue-marker 35: end-volume 36: 37: volume glue-server 38: type protocol/server 39: option transport-type tcp 40: option auth.addr./g1.allow 10.* 41: subvolumes /g1 42: end-volume +------------------------------------------------------------------------------+ [2011-11-13 21:46:09.133670] E [authenticate.c:227:gf_authenticate] 0-auth: no authentication module is interested in accepting remote-client (null) [2011-11-13 21:46:09.133729] E [server-handshake.c:553:server_setvolume] 0-glue-server: Cannot authenticate client from 127.0.0.1:1023 3.2.4 [2011-11-13 21:46:09.389447] I [server-handshake.c:542:server_setvolume] 0-glue-server: accepted client from 10.1.1.24:1022 (version: 3.2.4)