The Log of that particular volume says: [2014-02-18 09:43:17.136182] W [socket.c:410:__socket_keepalive] 0-socket: failed to set keep idle on socket 8 [2014-02-18 09:43:17.136285] W [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported [2014-02-18 09:43:18.343409] I [server-handshake.c:571:server_setvolume] 0-teoswitch_default_storage-server: accepted client from xxxxx55.domain.com-2075-2014/02/18-09:43:14:302234-teoswitch_default_storage-client-1-0 (version: 3.3.0) [2014-02-18 09:43:21.356302] I [server-handshake.c:571:server_setvolume] 0-teoswitch_default_storage-server: accepted client from xxxxx54. domain.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0 (version: 3.3.0) [2014-02-18 10:38:26.488333] W [socket.c:195:__socket_rwv] 0-tcp.teoswitch_default_storage-server: readv failed (Connection timed out) [2014-02-18 10:38:26.488431] I [server.c:685:server_rpc_notify] 0-teoswitch_default_storage-server: disconnecting connectionfrom xxxxx54.hexacta.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0 [2014-02-18 10:38:26.488494] I [server-helpers.c:741:server_connection_put] 0-teoswitch_default_storage-server: Shutting down connection xxxxx54.hexacta.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0 [2014-02-18 10:38:26.488541] I [server-helpers.c:629:server_connection_destroy] 0-teoswitch_default_storage-server: destroyed connection of xxxxx54.hexacta.com-9651-2014/02/18-09:42:00:141779-teoswitch_default_storage-client-1-0 When I try to access the folder I get. [root@hxteo55 ~]# ll /<path> /1001/voicemail/ ls: /<path>/1001/voicemail/: Input/output error This is the volume info: Volume Name: teoswitch_default_storage Type: Distribute Volume ID: 83c9d6f3-0288-4358-9fdc-b1d062cc8fca Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 12.12.123.54:/<path>/gluster/36779974/teoswitch_default_storage Brick2: 12.12.123.55:/<path>/gluster/36779974/teoswitch_default_storage Any ideas? Marco Zanger Phone 54 11 5299-5400 (int. 5501) Clay 2954, C1426DLD, Buenos Aires, Argentina Think Green - Please do not print this email unless you really need to -----Original Message----- From: Vijay Bellur [mailto:vbellur@xxxxxxxxxx] Sent: martes, 18 de febrero de 2014 03:56 a.m. To: Marco Zanger; gluster-users@xxxxxxxxxxx Subject: Re: Node down and volumes unreachable On 02/17/2014 11:19 PM, Marco Zanger wrote: > Read/write operations hang for long period of time (too long). I've > seen it in that state (waiting) for something like 5 minutes, which > makes every application fail trying to read or write. These are the > Errors I found in the logs in the server A which is still accessible > (B was down) > > etc-glusterfs-glusterd.vol.log > > ... > [2014-01-31 07:56:49.780247] W > [socket.c:1512:__socket_proto_state_machine] 0-management: reading > from socket failed. Error (Connection timed out), peer > (<SERVER_B_IP>:24007) > [2014-01-31 07:58:25.965783] E [socket.c:1715:socket_connect_finish] > 0-management: connection to <SERVER_B_IP>:24007 failed (No route to > host) > [2014-01-31 08:59:33.923250] I > [glusterd-handshake.c:397:glusterd_set_clnt_mgmt_program] 0-: Using > Program glusterd mgmt, Num (1238433), Version (2) > [2014-01-31 08:59:33.923289] I > [glusterd-handshake.c:403:glusterd_set_clnt_mgmt_program] 0-: Using Program Peer mgmt, Num (1238437), Version (2) ... > > > glustershd.log > > [2014-01-27 12:07:03.644849] W > [socket.c:1512:__socket_proto_state_machine] > 0-teoswitch_custom_music-client-1: reading from socket failed. Error > (Connection timed out), peer (<SERVER_B_IP>:24010) > [2014-01-27 12:07:03.644888] I [client.c:2090:client_rpc_notify] > 0-teoswitch_custom_music-client-1: disconnected > [2014-01-27 12:09:35.553628] E [socket.c:1715:socket_connect_finish] > 0-teoswitch_greetings-client-1: connection to <SERVER_B_IP>:24011 > failed (Connection timed out) > [2014-01-27 12:10:13.588148] E [socket.c:1715:socket_connect_finish] > 0-license_path-client-1: connection to <SERVER_B_IP>:24013 failed > (Connection timed out) > [2014-01-27 12:10:15.593699] E [socket.c:1715:socket_connect_finish] > 0-upload_path-client-1: connection to <SERVER_B_IP>:24009 failed > (Connection timed out) > [2014-01-27 12:10:21.601670] E [socket.c:1715:socket_connect_finish] > 0-teoswitch_ivr_greetings-client-1: connection to <SERVER_B_IP>:24012 > failed (Connection timed out) > [2014-01-27 12:10:23.607312] E [socket.c:1715:socket_connect_finish] > 0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 > failed (Connection timed out) > [2014-01-27 12:11:21.866604] E [afr-self-heald.c:418:_crawl_proceed] > 0-teoswitch_ivr_greetings-replicate-0: Stopping crawl as < 2 children > are up > [2014-01-27 12:11:21.867874] E [afr-self-heald.c:418:_crawl_proceed] > 0-teoswitch_greetings-replicate-0: Stopping crawl as < 2 children are > up > [2014-01-27 12:11:21.868134] E [afr-self-heald.c:418:_crawl_proceed] > 0-teoswitch_custom_music-replicate-0: Stopping crawl as < 2 children > are up > [2014-01-27 12:11:21.869417] E [afr-self-heald.c:418:_crawl_proceed] > 0-license_path-replicate-0: Stopping crawl as < 2 children are up > [2014-01-27 12:11:21.869659] E [afr-self-heald.c:418:_crawl_proceed] > 0-upload_path-replicate-0: Stopping crawl as < 2 children are up > [2014-01-27 12:12:53.948154] I > [client-handshake.c:1636:select_server_supported_programs] > 0-teoswitch_greetings-client-1: Using Program GlusterFS 3.3.0, Num > (1298437), Version (330) > [2014-01-27 12:12:53.952894] I > [client-handshake.c:1433:client_setvolume_cbk] > 0-teoswitch_greetings-client-1: Connected to <SERVER_B_IP>:24011, > attached to remote volume > > nfs.log there are lots of errors but the one that insist most Is this: > > [2014-01-27 12:12:27.136033] E [socket.c:1715:socket_connect_finish] > 0-teoswitch_custom_music-client-1: connection to <SERVER_B_IP>:24010 > failed (Connection timed out) > > Any ideas? From the logs I see nothing but confirm the fact that A cannot reach B which makes sense since B is down. But A is not, and it's volume should still be accesible. Right? Nothing very obvious from these logs. Can you share relevant portions of the client log file? Usually the name of the mount point would be a part of the client log file. -Vijay _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users