I am trying a workaround with clients: volume pnc4 type protocol/client option transport-type tcp option remote-host teoria4 option frame-timeout 180000 option ping-timeout 1 option remote-subvolume dados end-volume .... volume replicate type cluster/replicate subvolumes teoria3 teoria4 end-volume With server: I avoid autoscaling in io-threads. This way the "bailing out frame" error disapeared and the system is stable. 2009/6/3 Alpha Electronics <myitouchs@xxxxxxxxx>: > We applied the patch mentioned the thread, and use fixed thread count in the > server config. Unfortunately, we got the same error: > > [2009-06-03 04:57:36] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse: > 22347008: ERR => -1 (Resource temporarily unavailable) > [2009-06-03 07:55:04] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse: > 23431094: ERR => -1 (Resource temporarily unavailable) > [2009-06-03 15:58:25] E [client-protocol.c:292:call_bail] brick1: bailing > out frame LOOKUP(32) frame sent = 2009-06-03 15:28:23. frame-timeout = 1800 > > John > > > On Tue, Jun 2, 2009 at 12:25 AM, Shehjar Tikoo <shehjart@xxxxxxxxxxx> wrote: >> >> Hi >> >> > >> > Also, avoid using autoscaling in io-threads for now. >> > >> > -Shehjar >> > >> > >> >> -Shehjar >> >> Alpha Electronics wrote: >>> >>> Thanks for looking into this. We do use io-threads. Here is the server >>> config: >>> : volume brick1-posix >>> 2: type storage/posix >>> 3: option directory /mnt/brick1 >>> 4: end-volume >>> 5: >>> 6: volume brick2-posix >>> 7: type storage/posix >>> 8: option directory /mnt/brick2 >>> 9: end-volume >>> 10: >>> 11: >>> 12: volume brick1-locks >>> 13: type features/locks >>> 14: subvolumes brick1-posix >>> 15: end-volume >>> 16: >>> 17: volume brick2-locks >>> 18: type features/locks >>> 19: subvolumes brick2-posix >>> 20: end-volume >>> 21: >>> 22: volume brick1 >>> 23: type performance/io-threads >>> 24: option min-threads 16 >>> 25: option autoscaling on >>> 26: subvolumes brick1-locks >>> 27: end-volume >>> 28: >>> 29: volume brick2 >>> 30: type performance/io-threads >>> 31: option min-threads 16 >>> 32: option autoscaling on >>> 33: subvolumes brick2-locks >>> 34: end-volume >>> 35: >>> 36: volume server >>> 37: type protocol/server >>> 38: option transport-type tcp >>> 40: option auth.addr.brick1.allow * >>> 41: option auth.addr.brick2.allow * >>> 42: subvolumes brick1 brick2 >>> 43: end-volume >>> 44: >>> >>> >>> >>> On Sun, May 31, 2009 at 11:44 PM, Shehjar Tikoo <shehjart@xxxxxxxxxxx >>> <mailto:shehjart@xxxxxxxxxxx>> wrote: >>> >>> Alpha Electronics wrote: >>> >>> We are testing the glusterfs before recommending them to >>> enterprise clients. We found that the file system always hang >>> after running for about 2 days. after killing the server side >>> process and then restart, everything goes back to normal. >>> >>> >>> What is the server config? >>> If you're not using io-threads on the server, I suggest you do, >>> because it does basic load-balancing to avoid timeouts. >>> >>> Also, avoid using autoscaling in io-threads for now. >>> >>> -Shehjar >>> >>> >>> Here is the spec and error logged: >>> GlusterFS version: v2.0.1 >>> >>> Client volume: >>> volume brick_1 >>> type protocol/client >>> option transport-type tcp/client >>> option remote-port 7777 # Non-default port >>> option remote-host server1 >>> option remote-subvolume brick >>> end-volume >>> >>> volume brick_2 >>> type protocol/client >>> option transport-type tcp/client >>> option remote-port 7777 # Non-default port >>> option remote-host server2 >>> option remote-subvolume brick >>> end-volume >>> >>> volume bricks >>> type cluster/distribute >>> subvolumes brick_1 brick_2 >>> end-volume >>> >>> Error logged on client side through /var/log/glusterfs.log >>> [2009-05-29 14:58:55] E [client-protocol.c:292:call_bail] >>> brick_1: bailing out frame LK(28) frame sent = 2009-05-29 >>> 14:28:54. frame-timeout = 1800 >>> [2009-05-29 14:58:55] W [fuse-bridge.c:2284:fuse_setlk_cbk] >>> glusterfs-fuse: 106850788: ERR => -1 (Transport endpoint is not >>> connected) >>> error logged on server >>> [2009-05-29 14:59:15] E [client-protocol.c:292:call_bail] >>> brick_2: bailing out frame LK(28) frame sent = 2009-05-29 >>> 14:29:05. frame-timeout = 1800 >>> [2009-05-29 14:59:15] W [fuse-bridge.c:2284:fuse_setlk_cbk] >>> glusterfs-fuse: 106850860: ERR => -1 (Transport endpoint is not >>> connected) >>> >>> There is error message logged on server side after 1 hour in >>> /var/log/messages: >>> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0] >>> lib/util_sock.c:write_data(564) >>> May 29 16:04:16 server2 winbindd[3649]: write_data: write >>> failure. Error = Connection reset by peer >>> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0] >>> libsmb/clientgen.c:write_socket(158) >>> May 29 16:04:16 server2 winbindd[3649]: write_socket: Error >>> writing 104 bytes to socket 18: ERRNO = Connection reset by peer >>> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0] >>> libsmb/clientgen.c:cli_send_smb(188) >>> May 29 16:04:16 server2 winbindd[3649]: Error writing 104 >>> bytes to client. -1 (Connection reset by peer) >>> May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0] >>> libsmb/cliconnect.c:cli_session_setup_spnego(859) >>> May 29 16:04:16 server2 winbindd[3649]: Kinit failed: Cannot >>> contact any KDC for requested realm >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx> >>> http://lists.nongnu.org/mailman/listinfo/gluster-devel >>> >>> >>> >>> >> > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > > -- Rodrigo Azevedo Moreira da Silva Departamento de Física Universidade Federal de Pernambuco http://www.df.ufpe.br