Hello, I had 1 server and 20 client machines mounting a glusterfs partition. After several weeks working correctly, the server stopped responding for all clients. Trying to list the contents of the intended mounted directory hangs/blocks the application (such as a simple `ls`). Restarting the server made all clients automatically reconnect, which makes me think of a server failure, somehow. The thing is that the server logs report nothing at all: 2009-02-06 12:17:38 E [server-protocol.c:184:generic_reply] server: transport_writev failed 2009-02-06 12:22:23 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.128.105:1023) ----- my restart at 2009-02-10 12:01 ----- 2009-02-10 12:01:42 W [glusterfs.c:417:glusterfs_cleanup_and_exit] glusterfs: shutting down server 2009-02-10 12:01:47 E [server-protocol.c:5190:mop_getspec] server: Unable to open /etc/glusterfs/glusterfs-client.vol.192.168.128.101 (No such file or directory) 2009-02-10 12:01:47 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.128.101:1023) 2009-02-10 12:05:19 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.128.103:1023) 2009-02-10 12:05:19 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.128.202:1023) 2009-02-10 12:05:30 E [protocol.c:271:gf_block_unserialize_transport] server: EOF from peer (192.168.128.108:1023) There is absolutely nothing before my restart; however, in the logs of the clients I did find something: 2009-02-10 12:00:26 C [client-protocol.c:211:call_bail] filedata: bailing transport 2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960 2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465570: (34) / => -1 (107) 2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer] filedata: transport_submit failed 2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(35) reply=@0x8a60960 2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960 2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465577: (34) / => -1 (107) 2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer] filedata: transport_submit failed 2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960 2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465578: (34) /cust => -1 (107) 2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer] filedata: transport_submit failed 2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960 2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465579: (34) / => -1 (107) 2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer] filedata: transport_submit failed 2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960 2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465581: (34) / => -1 (107) 2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer] filedata: transport_submit failed 2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8 2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465570: (34) / => -1 (107) 2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8 2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465577: (34) / => -1 (107) 2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8 2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465578: (34) /cust => -1 (107) 2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8 2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465579: (34) / => -1 (107) 2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8 2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:01:42 E [client-protocol.c:324:client_protocol_xfer] filedata: transport_submit failed 2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465581: (34) / => -1 (107) 2009-02-10 12:05:30 C [client-protocol.c:211:call_bail] filedata: bailing transport 2009-02-10 12:05:30 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0x8a60938 2009-02-10 12:05:30 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:05:30 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 465585: (34) /cust => -1 (107) 2009-02-10 12:05:30 E [client-protocol.c:324:client_protocol_xfer] filedata: transport_submit failed Another client log: 2009-02-10 11:59:14 C [client-protocol.c:211:call_bail] filedata: bailing transport 2009-02-10 11:59:14 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0xb66005a0 2009-02-10 11:59:14 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 11:59:14 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 101046: (34) / => -1 (107) 2009-02-10 11:59:14 E [client-protocol.c:324:client_protocol_xfer] filedata: transport_submit failed 2009-02-10 11:59:14 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0xb66005a0 2009-02-10 11:59:14 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 11:59:14 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 101050: (34) / => -1 (107) 2009-02-10 11:59:14 E [client-protocol.c:324:client_protocol_xfer] filedata: transport_submit failed 2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0xb66009e0 2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 101046: (34) / => -1 (107) 2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup] filedata: forced unwinding frame type(1) op(34) reply=@0xb66009e0 2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk] filedata: no proper reply from server, returning ENOTCONN 2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse: 101050: (34) / => -1 (107) Server version: glusterfs 1.3.8pre6 built on Apr 23 2008 04:34:21 Client version: glusterfs 1.3.8pre6 built on Apr 23 2008 04:31:19 Another piece of interesting information is that there were about 50 simultaneous connections from each client (making up to a total of 1000 connections) to the server. Has anyone experienced anything similar before? Is there any fix for this? If you require any additional information, please do no hesitate to ask for it. Regards, Ioannis
begin:vcard fn:Ioannis Aslanidis n:Aslanidis;Ioannis org:Flumotion Services S.A.;Infrastructure Department adr:Edifici Nord Planta 2;;World Trade Center;Barcelona;Barcelona;08039;Spain email;internet:iaslanidis@xxxxxxxxxxxxx title:System and Network Administrator tel;work:+34935086359 tel;cell:+34627204575 url:http://www.flumotion.com version:2.1 end:vcard
Attachment:
signature.asc
Description: OpenPGP digital signature