Server hanged and dropped out the connections of all clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I had 1 server and 20 client machines mounting a glusterfs partition.
After several weeks working correctly, the server stopped responding for
all clients. Trying to list the contents of the intended mounted
directory hangs/blocks the application (such as a simple `ls`).

Restarting the server made all clients automatically reconnect, which
makes me think of a server failure, somehow.

The thing is that the server logs report nothing at all:

2009-02-06 12:17:38 E [server-protocol.c:184:generic_reply] server:
transport_writev failed
2009-02-06 12:22:23 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.105:1023)
----- my restart at 2009-02-10 12:01 -----
2009-02-10 12:01:42 W [glusterfs.c:417:glusterfs_cleanup_and_exit]
glusterfs: shutting down server
2009-02-10 12:01:47 E [server-protocol.c:5190:mop_getspec] server:
Unable to open /etc/glusterfs/glusterfs-client.vol.192.168.128.101 (No
such file or directory)
2009-02-10 12:01:47 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.101:1023)
2009-02-10 12:05:19 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.103:1023)
2009-02-10 12:05:19 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.202:1023)
2009-02-10 12:05:30 E [protocol.c:271:gf_block_unserialize_transport]
server: EOF from peer (192.168.128.108:1023)


There is absolutely nothing before my restart; however, in the logs of
the clients I did find something:

2009-02-10 12:00:26 C [client-protocol.c:211:call_bail] filedata:
bailing transport
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465570: (34) / => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(35) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465577: (34) / => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465578: (34) /cust => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465579: (34) / => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:00:26 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60960
2009-02-10 12:00:26 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:00:26 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465581: (34) / => -1 (107)
2009-02-10 12:00:26 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465570: (34) / => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465577: (34) / => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465578: (34) /cust => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465579: (34) / => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60fa8
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465581: (34) / => -1 (107)
2009-02-10 12:05:30 C [client-protocol.c:211:call_bail] filedata:
bailing transport
2009-02-10 12:05:30 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0x8a60938
2009-02-10 12:05:30 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:05:30 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
465585: (34) /cust => -1 (107)
2009-02-10 12:05:30 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed


Another client log:

2009-02-10 11:59:14 C [client-protocol.c:211:call_bail] filedata:
bailing transport
2009-02-10 11:59:14 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0xb66005a0
2009-02-10 11:59:14 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 11:59:14 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
101046: (34) / => -1 (107)
2009-02-10 11:59:14 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 11:59:14 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0xb66005a0
2009-02-10 11:59:14 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 11:59:14 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
101050: (34) / => -1 (107)
2009-02-10 11:59:14 E [client-protocol.c:324:client_protocol_xfer]
filedata: transport_submit failed
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0xb66009e0
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
101046: (34) / => -1 (107)
2009-02-10 12:01:42 E [client-protocol.c:4809:client_protocol_cleanup]
filedata: forced unwinding frame type(1) op(34) reply=@0xb66009e0
2009-02-10 12:01:42 E [client-protocol.c:4405:client_lookup_cbk]
filedata: no proper reply from server, returning ENOTCONN
2009-02-10 12:01:42 E [fuse-bridge.c:459:fuse_entry_cbk] glusterfs-fuse:
101050: (34) / => -1 (107)


Server version: glusterfs 1.3.8pre6 built on Apr 23 2008 04:34:21
Client version: glusterfs 1.3.8pre6 built on Apr 23 2008 04:31:19

Another piece of interesting information is that there were about 50
simultaneous connections from each client (making up to a total of 1000
connections) to the server.

Has anyone experienced anything similar before? Is there any fix for this?

If you require any additional information, please do no hesitate to ask
for it.

Regards,

Ioannis
begin:vcard
fn:Ioannis Aslanidis
n:Aslanidis;Ioannis
org:Flumotion Services S.A.;Infrastructure Department
adr:Edifici Nord Planta 2;;World Trade Center;Barcelona;Barcelona;08039;Spain
email;internet:iaslanidis@xxxxxxxxxxxxx
title:System and Network Administrator
tel;work:+34935086359
tel;cell:+34627204575
url:http://www.flumotion.com
version:2.1
end:vcard

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux