Hello Vikas, I have installed and tested now my setup with 1.4.0rc3. The good nes is that gluster does not crash anymore on the intermediate level of the structure. The bad news is that afr doesn't seem to work at all anymore for me. Even with a reduced setup using only 2 hosts and 2 sub-volumes for AFR on a local disk on one of the two i can't write anything onto the volume. When i try to create a file on the exported volume, it will hang for a few minutes and then return with a "transport endpoint not connected". Here are the two config and log files. Cheers, Rainer log host 1 (client): 2008-12-17 11:06:40 D [glusterfs.c:297:_get_specfp] glusterfs: loading volume file /home/rainer/sources/gluster/software.farmcontrol-client-debug.vol Version : glusterfs 1.4.0rc3 built on Dec 16 2008 10:23:44 TLA Revision : glusterfs--mainline--3.0--patch-777 Starting Time: 2008-12-17 11:06:40 Command line : glusterfs -f /home/rainer/sources/gluster/software.farmcontrol-client-debug.vol -l /var/log/gluster.log -L DEBUG /mnt/mnt1 given volfile +----- 1: volume hlta01-client 2: type protocol/client 3: option transport-type tcp/client 4: option remote-host hlta01 5: option remote-subvolume head 6: end-volume 7: 8: volume afr-sw-farmcontrol 9: type cluster/afr 10: subvolumes hlta01-client 11: end-volume 12: +----- 2008-12-17 11:06:40 D [spec.y:187:new_section] parser: New node for 'hlta01-client' 2008-12-17 11:06:40 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/protocol/client.so 2008-12-17 11:06:40 D [spec.y:213:section_type] parser: Type:hlta01-client:protocol/client 2008-12-17 11:06:40 D [spec.y:288:section_option] parser: Option:hlta01-client:transport-type:tcp/client 2008-12-17 11:06:40 D [spec.y:288:section_option] parser: Option:hlta01-client:remote-host:hlta01 2008-12-17 11:06:40 D [spec.y:288:section_option] parser: Option:hlta01-client:remote-subvolume:head 2008-12-17 11:06:40 D [spec.y:372:section_end] parser: end:hlta01-client 2008-12-17 11:06:40 D [spec.y:187:new_section] parser: New node for 'afr-sw-farmcontrol' 2008-12-17 11:06:40 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/cluster/afr.so 2008-12-17 11:06:40 D [spec.y:213:section_type] parser: Type:afr-sw-farmcontrol:cluster/afr 2008-12-17 11:06:40 D [spec.y:357:section_sub] parser: child:afr-sw-farmcontrol->hlta01-client 2008-12-17 11:06:40 D [spec.y:372:section_end] parser: end:afr-sw-farmcontrol 2008-12-17 11:06:40 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/mount/fuse.so 2008-12-17 11:06:40 D [glusterfs.c:927:main] glusterfs: running in pid 8938 2008-12-17 11:06:40 D [client-protocol.c:5955:init] hlta01-client: defaulting transport-timeout to 42 2008-12-17 11:06:40 D [transport.c:118:transport_load] transport: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/transport/socket.so 2008-12-17 11:06:40 D [client-protocol.c:6008:init] hlta01-client: defaulting limits.transaction-size to 268435456 2008-12-17 11:06:40 D [xlator.c:519:xlator_init_rec] hlta01-client: Initialization done 2008-12-17 11:06:40 D [client-protocol.c:6281:notify] hlta01-client: got GF_EVENT_PARENT_UP, attempting connect on transport 2008-12-17 11:06:40 D [client-protocol.c:6281:notify] hlta01-client: got GF_EVENT_PARENT_UP, attempting connect on transport 2008-12-17 11:06:40 D [inode.c:987:inode_table_new] fuse: creating new inode table with lru_limit=0 2008-12-17 11:06:40 D [inode.c:455:__inode_create] fuse/inode: create inode(0) 2008-12-17 11:06:41 D [client-protocol.c:5620:client_protocol_reconnect] hlta01-client: attempting reconnect 2008-12-17 11:06:41 D [name.c:183:af_inet_client_get_remote_sockaddr] hlta01-client: option remote-port missing in volume hlta01-client. Defaulting to 6996 2008-12-17 11:06:41 D [common-utils.c:213:gf_resolve_ip6] resolver: DNS cache not present, freshly probing hostname: hlta01 2008-12-17 11:06:41 D [common-utils.c:250:gf_resolve_ip6] resolver: returning ip-10.130.101.100 (port-6996) for hostname: hlta01 and port: 6996 2008-12-17 11:06:41 D [client-protocol.c:6313:notify] hlta01-client: got GF_EVENT_CHILD_UP 2008-12-17 11:06:41 D [socket.c:926:socket_connect] hlta01-client: connect () called on transport already connected 2008-12-17 11:06:41 D [client-protocol.c:5561:client_setvolume_cbk] hlta01-client: SETVOLUME on remote-host succeeded 2008-12-17 11:06:51 D [client-protocol.c:5629:client_protocol_reconnect] hlta01-client: breaking reconnect chain 2008-12-17 11:07:48 D [inode.c:280:__inode_activate] fuse/inode: activating inode(1), lru=0/0 active=1 purge=0 2008-12-17 11:07:48 D [fuse-bridge.c:455:fuse_lookup] glusterfs-fuse: 2: LOOKUP /test 2008-12-17 11:07:48 D [inode.c:455:__inode_create] fuse/inode: create inode(0) 2008-12-17 11:07:48 D [inode.c:280:__inode_activate] fuse/inode: activating inode(0), lru=0/0 active=2 purge=0 2008-12-17 11:07:48 D [fuse-bridge.c:406:fuse_entry_cbk] glusterfs-fuse: 2: LOOKUP() /test => -1 (No such file or directory) 2008-12-17 11:07:48 D [inode.c:323:__inode_retire] fuse/inode: retiring inode(0) lru=0/0 active=1 purge=1 2008-12-17 11:07:48 D [inode.c:455:__inode_create] fuse/inode: create inode(0) 2008-12-17 11:07:48 D [inode.c:280:__inode_activate] fuse/inode: activating inode(0), lru=0/0 active=2 purge=0 2008-12-17 11:07:48 D [fuse-bridge.c:1089:fuse_mknod] glusterfs-fuse: 3: MKNOD /test 2008-12-17 11:08:32 E [client-protocol.c:273:call_bail] hlta01-client: activating bail-out. pending frames = 1. last sent = 2008-12-17 11:07:49. last received = 2008-12-17 11:07:49. transport-timeout = 42 2008-12-17 11:08:32 C [client-protocol.c:308:call_bail] hlta01-client: bailing transport 2008-12-17 11:08:32 D [socket.c:183:__socket_disconnect] hlta01-client: shutdown() returned 0. setting connection state to -1 2008-12-17 11:08:32 D [socket.c:93:__socket_rwv] hlta01-client: EOF from peer 10.130.101.100:6996 2008-12-17 11:08:32 D [socket.c:568:socket_proto_state_machine] hlta01-client: socket read failed (Transport endpoint is not connected) in state 1 (10.130.101.100:6996) 2008-12-17 11:08:32 D [client-protocol.c:5652:protocol_client_cleanup] hlta01-client: cleaning up state in transport object 0x50fc00 2008-12-17 11:08:32 E [client-protocol.c:5712:protocol_client_cleanup] hlta01-client: forced unwinding frame type(1) op(MKNOD) reply=@0x518a10 2008-12-17 11:08:32 E [fuse-bridge.c:406:fuse_entry_cbk] glusterfs-fuse: 3: MKNOD() /test => -1 (Transport endpoint is not connected) 2008-12-17 11:08:32 E [socket.c:1189:socket_submit] hlta01-client: transport not connected to submit (priv->connected = 255) 2008-12-17 11:08:32 D [inode.c:323:__inode_retire] fuse/inode: retiring inode(0) lru=0/0 active=1 purge=1 2008-12-17 11:08:32 D [fuse-bridge.c:455:fuse_lookup] glusterfs-fuse: 4: LOOKUP /test 2008-12-17 11:08:32 D [inode.c:455:__inode_create] fuse/inode: create inode(0) 2008-12-17 11:08:32 D [inode.c:280:__inode_activate] fuse/inode: activating inode(0), lru=0/0 active=2 purge=0 2008-12-17 11:08:32 D [name.c:183:af_inet_client_get_remote_sockaddr] hlta01-client: option remote-port missing in volume hlta01-client. Defaulting to 6996 2008-12-17 11:08:32 D [common-utils.c:206:gf_resolve_ip6] resolver: flushing DNS cache 2008-12-17 11:08:32 D [common-utils.c:213:gf_resolve_ip6] resolver: DNS cache not present, freshly probing hostname: hlta01 2008-12-17 11:08:32 D [common-utils.c:250:gf_resolve_ip6] resolver: returning ip-10.130.101.100 (port-6996) for hostname: hlta01 and port: 6996 2008-12-17 11:08:32 E [fuse-bridge.c:406:fuse_entry_cbk] glusterfs-fuse: 4: LOOKUP() /test => -1 (Transport endpoint is not connected) 2008-12-17 11:08:32 D [inode.c:323:__inode_retire] fuse/inode: retiring inode(0) lru=0/0 active=1 purge=1 2008-12-17 11:08:32 D [client-protocol.c:6313:notify] hlta01-client: got GF_EVENT_CHILD_UP 2008-12-17 11:08:32 D [socket.c:926:socket_connect] hlta01-client: connect () called on transport already connected 2008-12-17 11:08:32 D [client-protocol.c:5561:client_setvolume_cbk] hlta01-client: SETVOLUME on remote-host succeeded 2008-12-17 11:08:33 D [client-protocol.c:5629:client_protocol_reconnect] hlta01-client: breaking reconnect chain log host2 (server): 2008-12-17 11:05:25 D [glusterfs.c:297:_get_specfp] glusterfs: loading volume file /home/rainer/sources/gluster/sw-farmctl.hlta01.vol Version : glusterfs 1.4.0rc3 built on Dec 16 2008 10:23:44 TLA Revision : glusterfs--mainline--3.0--patch-777 Starting Time: 2008-12-17 11:05:25 Command line : glusterfsd -f /home/rainer/sources/gluster/sw-farmctl.hlta01.vol -L DEBUG -l /var/log/glusterfs/glusterfsd.log given volfile +----- 1: volume local-brick 2: type storage/posix 3: option directory /localdisk/gluster/sw 4: end-volume 5: 6: volume lock-brick 7: type features/locks 8: subvolumes local-brick 9: end-volume 10: 11: volume local-brick2 12: type storage/posix 13: option directory /localdisk/gluster/sw2 14: end-volume 15: 16: volume lock-brick2 17: type features/locks 18: subvolumes local-brick2 19: end-volume 20: 21: #volume hlta0101-client 22: # type protocol/client 23: # option transport-type tcp/client 24: # option remote-host hlta0101 25: # option remote-subvolume sw-brick 26: #end-volume 27: 28: #volume hlta0102-client 29: # type protocol/client 30: # option trasport-type tcp/client 31: # option remote-host hlta0102 32: # option remote-subvolume sw-brick 33: #end-volume 34: 35: #volume hlta0103-client 36: # type protocol/client 37: # option trasport-type tcp/client 38: # option remote-host hlta0103 39: # option remote-subvolume sw-brick 40: #end-volume 41: 42: #volume hlta0104-client 43: # type protocol/client 44: # option trasport-type tcp/client 45: # option remote-host hlta0104 46: # option remote-subvolume sw-brick 47: #end-volume 48: 49: volume afr-distributor 50: type cluster/afr 51: subvolumes lock-brick lock-brick2 #hlta0101-client 52: #hlta0102-client hlta0103-client hlta0104-client 53: end-volume 54: 55: volume head 56: type debug/trace 57: subvolumes afr-distributor 58: end-volume 59: 60: #volume head 61: # type performance/io-threads 62: # option thread-count 4 # deault is 1 63: # option cache-size 128MB 64: # subvolumes afr-distributor 65: #end-volume 66: 67: volume server 68: type protocol/server 69: option transport-type tcp/server 70: option auth.addr.head.allow * 71: subvolumes head 72: end-volume +----- 2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for 'local-brick' 2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/storage/posix.so 2008-12-17 11:05:25 D [spec.y:213:section_type] parser: Type:local-brick:storage/posix 2008-12-17 11:05:25 D [spec.y:288:section_option] parser: Option:local-brick:directory:/localdisk/gluster/sw 2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:local-brick 2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for 'lock-brick' 2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/features/locks.so 2008-12-17 11:05:25 D [xlator.c:434:xlator_set_type] xlator: dlsym(notify) on /usr/lib64/glusterfs/1.4.0rc3/xlator/features/locks.so: undefined symbol: notify -- neglecting 2008-12-17 11:05:25 D [spec.y:213:section_type] parser: Type:lock-brick:features/locks 2008-12-17 11:05:25 D [spec.y:357:section_sub] parser: child:lock-brick->local-brick 2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:lock-brick 2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for 'local-brick2' 2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/storage/posix.so 2008-12-17 11:05:25 D [spec.y:213:section_type] parser: Type:local-brick2:storage/posix 2008-12-17 11:05:25 D [spec.y:288:section_option] parser: Option:local-brick2:directory:/localdisk/gluster/sw2 2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:local-brick2 2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for 'lock-brick2' 2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/features/locks.so 2008-12-17 11:05:25 D [xlator.c:434:xlator_set_type] xlator: dlsym(notify) on /usr/lib64/glusterfs/1.4.0rc3/xlator/features/locks.so: undefined symbol: notify -- neglecting 2008-12-17 11:05:25 D [spec.y:213:section_type] parser: Type:lock-brick2:features/locks 2008-12-17 11:05:25 D [spec.y:357:section_sub] parser: child:lock-brick2->local-brick2 2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:lock-brick2 2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for 'afr-distributor' 2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/cluster/afr.so 2008-12-17 11:05:25 D [spec.y:213:section_type] parser: Type:afr-distributor:cluster/afr 2008-12-17 11:05:25 D [spec.y:357:section_sub] parser: child:afr-distributor->lock-brick 2008-12-17 11:05:25 D [spec.y:357:section_sub] parser: child:afr-distributor->lock-brick2 2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:afr-distributor 2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for 'head' 2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/debug/trace.so 2008-12-17 11:05:25 D [xlator.c:434:xlator_set_type] xlator: dlsym(notify) on /usr/lib64/glusterfs/1.4.0rc3/xlator/debug/trace.so: undefined symbol: notify -- neglecting 2008-12-17 11:05:25 D [spec.y:213:section_type] parser: Type:head:debug/trace 2008-12-17 11:05:25 D [spec.y:357:section_sub] parser: child:head->afr-distributor 2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:head 2008-12-17 11:05:25 D [spec.y:187:new_section] parser: New node for 'server' 2008-12-17 11:05:25 D [xlator.c:394:xlator_set_type] xlator: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/xlator/protocol/server.so 2008-12-17 11:05:25 D [spec.y:213:section_type] parser: Type:server:protocol/server 2008-12-17 11:05:25 D [spec.y:288:section_option] parser: Option:server:transport-type:tcp/server 2008-12-17 11:05:25 D [spec.y:288:section_option] parser: Option:server:auth.addr.head.allow:* 2008-12-17 11:05:25 D [spec.y:357:section_sub] parser: child:server->head 2008-12-17 11:05:25 D [spec.y:372:section_end] parser: end:server 2008-12-17 11:05:25 D [glusterfs.c:927:main] glusterfs: running in pid 24506 2008-12-17 11:05:25 D [transport.c:118:transport_load] transport: attempt to load file /usr/lib64/glusterfs/1.4.0rc3/transport/socket.so 2008-12-17 11:05:25 D [server-protocol.c:7596:init] server: defaulting limits.transaction-size to 4194304 2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] local-brick: Initialization done 2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] lock-brick: Initialization done 2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] local-brick2: Initialization done 2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] lock-brick2: Initialization done 2008-12-17 11:05:25 D [xlator.c:519:xlator_init_rec] afr-distributor: Initialization done 2008-12-17 11:05:25 C [dict.c:1067:data_to_str] dict: @data=(nil) 2008-12-17 11:05:25 C [dict.c:1067:data_to_str] dict: @data=(nil) 2008-12-17 11:07:48 N [trace.c:1237:trace_lookup] head: 3: (loc {path=/test, ino=0} need_xattr=1) 2008-12-17 11:07:48 N [trace.c:513:trace_lookup_cbk] head: 3: (op_ret=-1, op_errno=2) 2008-12-17 11:07:48 N [trace.c:1101:trace_entrylk] head: 4: (loc= {path=/, ino=1} basename=test, cmd=ENTRYLK_LOCK, type=ENTRYLK_WRLCK) 2008-12-17 11:07:48 N [trace.c:1021:trace_entrylk_cbk] head: 4: op_ret=0, op_errno=0 2008-12-17 11:07:48 N [trace.c:1189:trace_xattrop] head: 5: (path=/, ino=1 flags=0) 2008-12-17 11:07:48 E [posix.c:2419:posix_xattrop] local-brick: /: Numerical result out of range 2008-12-17 11:07:48 E [posix.c:2419:posix_xattrop] local-brick2: /: Numerical result out of range 2008-12-17 11:07:48 N [trace.c:1042:trace_xattrop_cbk] head: 5: (op_ret=0, op_errno=34) 2008-12-17 11:07:48 N [trace.c:1307:trace_mknod] head: 6: (loc {path=/test, ino=0}, mode=33188, dev=0) On Mon, 2008-12-15 at 21:24 +0530, Vikas Gorur wrote: > Rainer, > > Thank you for your interest in GlusterFS. > > I do not know of any user who's had an AFR configuration with 40-50 > subvolumes, but there is no reason it shouldn't work. The write > performance will obviously be quite low, but in your case since you > will not be making heavy/daily use of it (the only writes will be when > you make a new release, if I understand correctly), that shouldn't be > an issue. > > The version of GlusterFS you're using (1.3.12) is rather old now. We > have a new release 1.4.0 in the final stages of testing. We haven't > yet completely tested the AFR-over-AFR setup yet. > > You could either wait a few days (less than a week) for us to make the > RC1 release with AFR-over-AFR tested or grab the TLA repository > version and give it a try. > > Vikas > -- > Engineer - Z Research > http://gluster.com/