Hi I am stil testing high availability on release-3.3. I have a 2 replica setup, and it is busy building NetBSD [1]. If I kill glusterfsd on one server, leaving the other one alive, it causes all ongoing operations to fail, and the mount is wrecked beyond recovery: client$ ls ls: .: Socket is not connected client$ cd / client$ cd - /bin/ksh: cd: /pfs/manu/netbsd/usr/src - Socket is not connected I have to restart the stopped brick in order to use the mount again. I tried reproducing the issue with something simplier than a huge build, without success so far: the mount survives a brick being stopped, and I do not even have failures on ongoing operations. I would appreciate some hint on how to reproduce the problem in a simple test case. Here is the begging of client log at failure time: [2012-06-18 09:14:24.421370] W [socket.c:1512:__socket_proto_state_machine] 0-pfs-client-0: reading from socket failed. Error (Socket is not connected), peer (193.54.82.99:24010) [2012-06-18 09:14:24.443002] E [rpc-clnt.c:373:saved_frames_unwind] 0-pfs-client-0: forced unwinding frame type(GlusterFS 3.1) op(INODELK(29)) called at 2012-06-18 09:14:23.516562 (xid=0x9483261x) [2012-06-18 09:14:24.443448] W [client3_1-fops.c:1495:client3_1_inodelk_cbk] 0-pfs-client-0: remote operation failed: Socket is not connected [2012-06-18 09:14:24.443660] I [afr-lk-common.c:1006:afr_lock_blocking] 0-pfs-replicate-0: unable to lock on even one child [2012-06-18 09:14:24.443735] I [afr-transaction.c:994:afr_post_blocking_inodelk_cbk] 0-pfs-replicate-0: Blocking inodelks failed. [2012-06-18 09:14:24.443903] W [fuse-bridge.c:788:fuse_setattr_cbk] 0-glusterfs-fuse: 7435303: SETATTR() /manu/netbsd/usr/src/destdir.i386/usr/include/sys/featuretest.h => -1 (Socket is not connected) [2012-06-18 09:14:24.445857] I [socket.c:2315:socket_submit_request] 0-pfs-client-0: not connected (priv->connected = 0) [2012-06-18 09:14:24.446190] W [rpc-clnt.c:1498:rpc_clnt_submit] 0-pfs-client-0: failed to submit rpc-request (XID: 0x9483576x Program: GlusterFS 3.1, ProgVers: 330, Proc: 41) to rpc-transport (pfs-client-0) [2012-06-18 09:14:24.446665] E [rpc-clnt.c:373:saved_frames_unwind] 0-pfs-client-0: forced unwinding frame type(GlusterFS 3.1) op(INODELK(29)) called at 2012-06-18 09:14:23.517840 (xid=0x9483266x) [2012-06-18 09:14:24.447066] W [client3_1-fops.c:1495:client3_1_inodelk_cbk] 0-pfs-client-0: remote operation failed: Socket is not connected [2012-06-18 09:14:24.447364] I [afr-lk-common.c:1006:afr_lock_blocking] 0-pfs-replicate-0: unable to lock on even one child [2012-06-18 09:14:24.447666] I [afr-transaction.c:994:afr_post_blocking_inodelk_cbk] 0-pfs-replicate-0: Blocking inodelks failed. [2012-06-18 09:14:24.448142] W [fuse-bridge.c:788:fuse_setattr_cbk] 0-glusterfs-fuse: 7435323: SETATTR() /manu/netbsd/usr/src/destdir.i386/usr/include/sys/featuretest.h => -1 (Socket is not connected) [1] for anyone willing to reproduce, get and unpack gnusrc.tgz src.tgz syssrc.tgzsharesrc.tgz from ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-5.1.2/source/sets/ Then: cd usr/src && ./build.sh -Uuo release -- Emmanuel Dreyfus manu@xxxxxxxxxx