Hey guys, I've been using glusterfs to share a volume between two webservers happily for quite a while. However, for some reason, they've got into a bit of a state such that typing 'df -k' causes both to hang, resulting in a loss of service for42 seconds. I see the following messages in the log files: Any ideas what might be causing this? Server1 Glusterfs.log: (i.e. the client log) [2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind] 10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP) [2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind] 10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP) [2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind] 10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP) [2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind] 10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP) [2011-01-15 11:22:54] E [saved-frames.c:165:saved_frames_unwind] 10.10.130.11-1: forced unwinding frame type(2) op(PING) [2011-01-15 11:22:54] N [client-protocol.c:6976:notify] 10.10.130.11-1: disconnected [2011-01-15 11:22:54] N [client-protocol.c:6228:client_setvolume_cbk] 10.10.130.11-1: Connected to 10.10.130.11:6996, attached to remote volume 'brick1'. [2011-01-15 11:22:54] N [client-protocol.c:6228:client_setvolume_cbk] 10.10.130.11-1: Connected to 10.10.130.11:6996, attached to remote volume 'brick1'. Glusterfsd.log: [2011-01-15 11:22:54] N [server-protocol.c:6748:notify] server-tcp: 10.10.130.12:1023 disconnected [2011-01-15 11:22:54] N [server-protocol.c:6748:notify] server-tcp: 10.10.130.11:1022 disconnected [2011-01-15 11:22:54] N [server-protocol.c:6748:notify] server-tcp: 10.10.130.12:1022 disconnected [2011-01-15 11:22:54] N [server-helpers.c:842:server_connection_destroy] server-tcp: destroyed connection of w3-4176-2010/10/19-06:35:34:26343-10.10.130.11-1 [2011-01-15 11:22:54] N [server-protocol.c:6748:notify] server-tcp: 10.10.130.11:1018 disconnected [2011-01-15 11:22:54] N [server-helpers.c:842:server_connection_destroy] server-tcp: destroyed connection of w2-827-2011/01/15-11:09:38:7996-10.10.130.11-1 [2011-01-15 11:22:54] N [server-protocol.c:5812:mop_setvolume] server-tcp: accepted client from 10.10.130.12:1019 [2011-01-15 11:22:54] N [server-protocol.c:5812:mop_setvolume] server-tcp: accepted client from 10.10.130.12:1018 [2011-01-15 11:22:54] N [server-protocol.c:5812:mop_setvolume] server-tcp: accepted client from 10.10.130.11:1023 [2011-01-15 11:22:54] N [server-protocol.c:5812:mop_setvolume] server-tcp: accepted client from 10.10.130.11:1019 Server2 Client log: [2011-01-15 11:21:47] E [client-protocol.c:415:client_ping_timer_expired] 10.10.130.11-1: Server 10.10.130.11:6996 has not responded in the last 42 seconds, disconnecting. [2011-01-15 11:21:47] E [saved-frames.c:165:saved_frames_unwind] 10.10.130.11-1: forced unwinding frame type(1) op(STATFS) [2011-01-15 11:21:47] E [saved-frames.c:165:saved_frames_unwind] 10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP) [2011-01-15 11:21:47] E [saved-frames.c:165:saved_frames_unwind] 10.10.130.11-1: forced unwinding frame type(1) op(LOOKUP) [2011-01-15 11:21:47] N [client-protocol.c:6976:notify] 10.10.130.11-1: disconnected [2011-01-15 11:22:54] N [client-protocol.c:6228:client_setvolume_cbk] 10.10.130.11-1: Connected to 10.10.130.11:6996, attached to remote volume 'brick1'. [2011-01-15 11:22:54] N [client-protocol.c:6228:client_setvolume_cbk] 10.10.130.11-1: Connected to 10.10.130.11:6996, attached to remote volume 'brick1'. Note that the 2nd server doesn't show anything in the server log. My glusterfsd.vol: volume posix1 type storage/posix option directory /data/export end-volume volume brick1 type features/locks subvolumes posix1 end-volume volume server-tcp type protocol/server option transport-type tcp option auth.addr.brick1.allow * option transport.socket.listen-port 6996 option transport.socket.nodelay on subvolumes brick1 end-volume repstore.vol ## file auto generated by /usr/bin/glusterfs-volgen (mount.vol) # Cmd line: # $ /usr/bin/glusterfs-volgen --name repstore1 --raid 1 10.10.130.11:/data/export 10.10.130.12:/data/export # RAID 1 # TRANSPORT-TYPE tcp volume 10.10.130.12-1 type protocol/client option transport-type tcp option remote-host 10.10.130.12 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume brick1 end-volume volume 10.10.130.11-1 type protocol/client option transport-type tcp option remote-host 10.10.130.11 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume brick1 end-volume volume mirror-0 type cluster/replicate subvolumes 10.10.130.11-1 10.10.130.12-1 end-volume volume writebehind type performance/write-behind option cache-size 4MB subvolumes mirror-0 end-volume volume iocache type performance/io-cache option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB option cache-timeout 60 subvolumes writebehind end-volume -- joe. Joe Warren-Meeks Director Of Systems Development ENCORE TICKETS LTD Encore House, 50-51 Bedford Row, London WC1R 4LR Direct line: +44 (0)20 7492 1506 Reservations: +44 (0)20 7492 1500 Fax: +44 (0)20 7831 4410 Email: joe at encoretickets.co.uk <mailto:joe at encoretickets.co.uk> web: www.encoretickets.co.uk <http://www.encoretickets.co.uk/> Copyright in this message and any attachments remains with us. It is confidential and may be legally privileged. If this message is not intended for you it must not be read, copied or used by you or disclosed to anyone else. Please advise the sender immediately if you have received this message in error. Although this message and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by Encore Tickets Limited for any loss or damage in any way arising from its use.