I just installed 1.4 rc2 tonight. I was running rc1 since it was released -- this problem probably existed then, but I didn't test it.. I have 2 servers which AFR to eachother. I rebooted one, and while that server was down, the other servers gluster mount was hung. once the other server came back up (was pingable on the network).. the gluster process crashed I remounted the filesystem and it began auto-healing. heres' the log from the server that was not rebooted (the other one doesn't have anything in the log other than the gluster startup stuff. Version : glusterfs 1.4.0rc2 built on Dec 13 2008 22:36:17 TLA Revision : glusterfs--mainline--3.0--patch-770 Starting Time: 2008-12-13 22:43:12 Command line : /usr/local/sbin/glusterfs --log-level=WARNING --volfile=/etc/glus terfs/glusterfs-home.vol /home given volfile +----- 1: ### file: server-volume.spec.sample 2: 3: ############################################## 4: ### GlusterFS Server Volume Specification ## 5: ############################################## 6: 7: #### CONFIG FILE RULES: 8: ### "#" is comment character. 9: ### - Config file is case sensitive 10: ### - Options within a volume block can be in any order. 11: ### - Spaces or tabs are used as delimitter within a line. 12: ### - Multiple values to options will be : delimitted. 13: ### - Each option should end within a line. 14: ### - Missing or commented fields will assume default values. 15: ### - Blank/commented lines are allowed. 16: ### - Sub-volumes should already be defined above before referring. 17: 18: ### Export volume "home1" with the contents of "/home/export" directory. 19: volume home1 20: type storage/posix # POSIX FS translator 21: option directory /gluster/home # Export this directory 22: end-volume 23: 24: volume posix-locks-home1 25: type features/posix-locks 26: option mandatory on 27: subvolumes home1 28: end-volume 29: 30: ## Reference volume "home2" from remote server 31: volume home2 32: type protocol/client # POSIX FS translator 33: option transport-type tcp/client 34: option remote-host 192.168.2.2 # IP address of remote host 35: option remote-subvolume posix-locks-home1 # use home1 on remote ho st 36: option transport-timeout 10 # value in seconds; it should be se t relatively low 37: end-volume 38: 39: ### Add network serving capability to above home. 40: volume server 41: type protocol/server 42: option transport-type tcp/server # For TCP/IP transport 43: subvolumes posix-locks-home1 44: option auth.addr.posix-locks-home1.allow 192.168.2.2,127.0.0.1 # Allow a ccess to "home1" volume 45: end-volume 46: 47: ### Create automatic file replication 48: volume home 49: type cluster/afr 50: option read-subvolume posix-locks-home1 51: subvolumes posix-locks-home1 home2 52: # subvolumes posix-locks-home1 53: end-volume 54: 55: #volume threads1 56: # type performance/io-threads 57: # option thread-count 2 58: # option cache-size 32MB 59: # subvolumes home 60: #end-volume +----- 2008-12-13 22:47:53 W [afr-self-heal-common.c:985:afr_self_heal] home: performin g self heal on /ac/mail (metadata=0 data=0 entry=1) 2008-12-13 22:47:53 W [afr-self-heal-entry.c:1620:afr_sh_entry_impunge_all] home : impunging entries of /ac/mail on posix-locks-home1 to other sinks 2008-12-13 22:47:53 W [afr-self-heal-entry.c:858:afr_sh_entry_expunge_all] home: expunging entries of /ac/mail on home2 to other sinks 2008-12-13 22:47:53 E [posix.c:1834:posix_release] home1: pfd->dir is 0x17e2ac0 (not NULL) for file fd=0x17e1850 2008-12-13 22:47:53 W [afr-self-heal-entry.c:70:afr_sh_entry_done] home: self he al of /ac/mail completed 2008-12-14 00:10:24 E [client-protocol.c:273:call_bail] home2: activating bail-o ut. pending frames = 45. last sent = 2008-12-14 00:10:08. last received = 2008-1 2-14 00:07:57. transport-timeout = 10 2008-12-14 00:10:24 C [client-protocol.c:308:call_bail] home2: bailing transport 2008-12-14 00:10:24 E [client-protocol.c:5728:protocol_client_cleanup] home2: fo rced unwinding frame type(3) op(RELEASE) reply=@0x17675d0 2008-12-14 00:10:24 E [client-protocol.c:5712:protocol_client_cleanup] home2: fo rced unwinding frame type(1) op(LOOKUP) reply=@0x17675d0 2008-12-14 00:10:24 E [socket.c:1189:socket_submit] home2: transport not connect ed to submit (priv->connected = 255) 2008-12-14 00:10:24 W [common-utils.c:156:gf_print_bytes] glusterfs: Total data (in bytes): transfered (55132079), received (40565752) pending frames: frame : type(1) op(12) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) frame : type(1) op(32) Signal received: 11 configuration details:argp 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 tv_nsec 1 package-string: glusterfs 1.4.0rc2 Version : glusterfs 1.4.0rc2 built on Dec 13 2008 22:36:17 TLA Revision : glusterfs--mainline--3.0--patch-770 Starting Time: 2008-12-14 00:10:38 Command line : /usr/local/sbin/glusterfs --log-level=WARNING --volfile=/etc/glus terfs/glusterfs-home.vol /home given volfile +----- 1: ### file: server-volume.spec.sample 2: 3: ############################################## 4: ### GlusterFS Server Volume Specification ## 5: ############################################## 6: 7: #### CONFIG FILE RULES: 8: ### "#" is comment character. 9: ### - Config file is case sensitive 10: ### - Options within a volume block can be in any order. 11: ### - Spaces or tabs are used as delimitter within a line. 12: ### - Multiple values to options will be : delimitted. 13: ### - Each option should end within a line. 14: ### - Missing or commented fields will assume default values. 15: ### - Blank/commented lines are allowed. 16: ### - Sub-volumes should already be defined above before referring. 17: 18: ### Export volume "home1" with the contents of "/home/export" directory. 19: volume home1 20: type storage/posix # POSIX FS translator 21: option directory /gluster/home # Export this directory 22: end-volume 23: 24: volume posix-locks-home1 25: type features/posix-locks 26: option mandatory on 27: subvolumes home1 28: end-volume 29: 30: ## Reference volume "home2" from remote server 31: volume home2 32: type protocol/client # POSIX FS translator 33: option transport-type tcp/client 34: option remote-host 192.168.2.2 # IP address of remote host 35: option remote-subvolume posix-locks-home1 # use home1 on remote ho st 36: option transport-timeout 10 # value in seconds; it should be se t relatively low 37: end-volume 38: 39: ### Add network serving capability to above home. 40: volume server 41: type protocol/server 42: option transport-type tcp/server # For TCP/IP transport 43: subvolumes posix-locks-home1 44: option auth.addr.posix-locks-home1.allow 192.168.2.2,127.0.0.1 # Allow a ccess to "home1" volume 45: end-volume 46: 47: ### Create automatic file replication 48: volume home 49: type cluster/afr 50: option read-subvolume posix-locks-home1 51: subvolumes posix-locks-home1 home2 52: # subvolumes posix-locks-home1 53: end-volume 54: 55: #volume threads1 56: # type performance/io-threads 57: # option thread-count 2 58: # option cache-size 32MB 59: # subvolumes home 60: #end-volume +----- 2008-12-14 00:10:39 E [socket.c:710:socket_connect_finish] home2: connection fai led (Connection refused) 2008-12-14 00:10:43 E [client-protocol.c:135:this_fd_get] home2: failed to get r emote fd number for fd_t(0x1552360) 2008-12-14 00:10:43 E [client-protocol.c:2634:client_lk] home2: failed to get re mote fd from fd_t(0x1552360). returning EBADFD 2008-12-14 00:10:43 E [client-protocol.c:135:this_fd_get] home2: failed to get r emote fd number for fd_t(0x1552360) 2008-12-14 00:10:43 E [client-protocol.c:2634:client_lk] home2: failed to get re mote fd from fd_t(0x1552360). returning EBADFD 2008-12-14 00:11:18 W [afr-self-heal-common.c:985:afr_self_heal] home: performin g self heal on /sharedtmp (metadata=0 data=0 entry=1) 2008-12-14 00:11:18 W [afr-self-heal-entry.c:1620:afr_sh_entry_impunge_all] home : impunging entries of /sharedtmp on posix-locks-home1 to other sinks 2008-12-14 00:11:18 W [afr-self-heal-entry.c:858:afr_sh_entry_expunge_all] home: expunging entries of /sharedtmp on home2 to other sinks 2008-12-14 00:11:18 W [afr-self-heal-entry.c:70:afr_sh_entry_done] home: self he al of /sharedtmp completed