Gowda, I'm running gluster in a production environment and getting similar errors as Rohan's. We have 8 computers running as servers and 11 clients connecting to the servers (including the 8 clients that are also servers). The servers are grouped into 4 mirrors and then unified as one large filesystem. This is our current client spec file: volume client1 type protocol/client option transport-type tcp/client option remote-host 192.168.2.46 option remote-subvolume brick option transport-timeout 10 end-volume volume client2 type protocol/client option transport-type tcp/client option remote-host 192.168.2.47 option remote-subvolume brick option transport-timeout 10 end-volume volume client3 type protocol/client option transport-type tcp/client option remote-host 192.168.2.48 option remote-subvolume brick option transport-timeout 10 end-volume volume client4 type protocol/client option transport-type tcp/client option remote-host 192.168.2.49 option remote-subvolume brick option transport-timeout 10 end-volume volume client5 type protocol/client option transport-type tcp/client option remote-host 192.168.2.50 option remote-subvolume brick option transport-timeout 10 end-volume volume client6 type protocol/client option transport-type tcp/client option remote-host 192.168.2.65 option remote-subvolume brick option transport-timeout 10 end-volume volume client7 type protocol/client option transport-type tcp/client option remote-host 192.168.2.43 option remote-subvolume brick option transport-timeout 10 end-volume volume client8 type protocol/client option transport-type tcp/client option remote-host 192.168.2.44 option remote-subvolume brick option transport-timeout 10 end-volume ### Client Namespaves volume client1ns type protocol/client option transport-type tcp/client option remote-host 192.168.2.46 option remote-subvolume brickns option transport-timeout 10 end-volume volume client2ns type protocol/client option transport-type tcp/client option remote-host 192.168.2.47 option remote-subvolume brickns option transport-timeout 10 end-volume volume brick-ns type cluster/afr subvolumes client1ns client2ns end-volume volume afr1 type cluster/afr subvolumes client1 client2 end-volume volume afr2 type cluster/afr subvolumes client3 client4 end-volume volume afr3 type cluster/afr subvolumes client5 client6 end-volume volume afr4 type cluster/afr subvolumes client7 client8 end-volume # Unify volume bricks type cluster/unify subvolumes afr1 afr2 afr3 afr4 option namespace brick-ns end-volume This is the server spec file: volume brick type storage/posix # POSIX FS translator option directory /var/gluster/export # Export this directory end-volume volume brickns type storage/posix # POSIX FS translator option directory /var/gluster/export-ns # Export this directory end-volume ### Add network serving capability to above brick. volume server type protocol/server option transport-type tcp/server # For TCP/IP transport subvolumes brick brickns option auth.ip.brick.allow 192.168.2.* # Allow access to "brick" volume option auth.ip.brickns.allow 192.168.2.* # Allow access to "brick" volume end-volume Here are the server logs from the last time the servers failed: 2008-01-29 05:22:44 E [protocol.c:254:gf_block_unserialize_transport] server: EOF from peer (192.168.2.49:1023) 2008-01-29 05:22:44 E [posix.c:1209:posix_close] brick: pfd->dir is 0x512ff0 (not NULL) for file fd=0x50ed50 2008-01-29 05:22:44 E [posix.c:1209:posix_close] brick: pfd->dir is 0x2aaaab002610 (not NULL) for file fd=0x2aaaab000d10 2008-01-29 05:22:44 C [tcp.c:81:tcp_disconnect] server: connection disconnected 2008-01-29 05:25:19 E [protocol.c:254:gf_block_unserialize_transport] server: EOF from peer (192.168.2.43:1001) 2008-01-29 05:25:19 C [tcp.c:81:tcp_disconnect] server: connection disconnected Here are the client logs: 2008-01-29 05:22:16 C [client-protocol.c:217:call_bail] client4: bailing transport 2008-01-29 05:22:16 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x532d20 2008-01-29 05:22:16 C [tcp.c:81:tcp_disconnect] client4: connection disconnected 2008-01-29 05:22:36 C [client-protocol.c:217:call_bail] client4: bailing transport 2008-01-29 05:22:36 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x5398d0 2008-01-29 05:22:36 C [tcp.c:81:tcp_disconnect] client4: connection disconnected 2008-01-29 05:22:56 C [client-protocol.c:217:call_bail] client4: bailing transport 2008-01-29 05:22:56 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x532a20 2008-01-29 05:22:56 C [tcp.c:81:tcp_disconnect] client4: connection disconnected 2008-01-29 05:23:16 C [client-protocol.c:217:call_bail] client4: bailing transport 2008-01-29 05:23:16 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x531c40 2008-01-29 05:23:16 C [tcp.c:81:tcp_disconnect] client4: connection disconnected 2008-01-29 05:24:14 C [client-protocol.c:217:call_bail] client4: bailing transport 2008-01-29 05:24:14 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x53a7d0 2008-01-29 05:24:14 C [tcp.c:81:tcp_disconnect] client4: connection disconnected 2008-01-29 05:25:19 C [client-protocol.c:217:call_bail] client4: bailing transport 2008-01-29 05:25:19 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x532be0 2008-01-29 05:25:19 C [tcp.c:81:tcp_disconnect] client4: connection disconnected 2008-01-29 05:25:37 C [client-protocol.c:217:call_bail] client4: bailing transport 2008-01-29 05:25:37 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x5338a0 2008-01-29 05:25:37 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x5338a0 2008-01-29 05:25:37 C [tcp.c:81:tcp_disconnect] client4: connection disconnected 2008-01-29 05:26:10 C [client-protocol.c:217:call_bail] client4: bailing transport 2008-01-29 05:26:10 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x533a80 2008-01-29 05:26:10 C [tcp.c:81:tcp_disconnect] client4: connection disconnected 2008-01-29 05:27:20 C [client-protocol.c:217:call_bail] client4: bailing transport 2008-01-29 05:27:20 E [client-protocol.c:4555:client_protocol_cleanup] client4: forced unwinding frame type(0) op(34) reply=@0x53f0e0 2008-01-29 05:27:20 C [tcp.c:81:tcp_disconnect] client4: connection disconnected 2008-01-29 05:27:59 E [tcp-client.c:171:tcp_connect] client4: non-blocking connect() returned: 113 (No route to host) 2008-01-29 10:16:24 E [fuse-bridge.c:431:fuse_entry_cbk] glusterfs-fuse: 6336: /weatherflow_wp/cache/wp-cache-8015965cebd529c3433e331c330e193b.meta=> -1 (2) 2008-01-29 10:16:24 E [fuse-bridge.c:431:fuse_entry_cbk] glusterfs-fuse: 6336: /weatherflow_wp/cache/wp-cache-8015965cebd529c3433e331c330e193b.meta=> -1 (2) 2008-01-29 10:16:41 E [fuse-bridge.c:431:fuse_entry_cbk] glusterfs-fuse: 6442: /weatherflow_wp/cache/wp-cache-bb98ceeeb4ca784b406a1aedd2df7bc0.meta=> -1 (2) We are running fuse 2.6.5-2.fc5 and glusterfs--mainline--2.5 patch-643 The problem starts around 5:00am when our cleanup scripts run. It was failing everyday when updatedb ran across the cluster on every server, but since I setup updatedb to ignore gluster mounts the system is able to stay up much longer now. On Jan 29, 2008 8:19 AM, Basavanagowda Kanur <gowda@xxxxxxxxxxxxx> wrote: > Rohan, > Can you please send us the server logs and also the volume spec files of > both server and client? That will help us find out the exact reason for > this > problem. > > -- > Gowda > >