I still get spurious disconnects in 3.4.0alpha3. While there I note this patch that has not beeen pulled up to 3.4 branch, while it fixes a problem I envountered on alpha2: http://review.gluster.com/#/c/4588/ Here is the first occurence of a spurious disconnect on client side (I added debug messages) [2013-04-17 21:07:47.198612] E [socket.c:487:__socket_rwv] 0-gfs33-client-2: EOF on socket (errno = 0, opcount = 1, opvector[0].iov_len = 4 [2013-04-17 21:07:47.198824] W [socket.c:515:__socket_rwv] 0-gfs33-client-2: readv failed (No message available) [2013-04-17 21:07:47.198947] W [socket.c:1963:__socket_proto_state_machine] 0-gfs33-client-2: reading from socket failed. Error (No message available), peer (192.0.2.103:49153) [2013-04-17 21:07:47.199000] I [client.c:2097:client_rpc_notify] 0-gfs33-client-2: disconnected [2013-04-17 21:07:47.266289] W [client-rpc-fops.c:1640:client3_3_entrylk_cbk] 0-gfs33-client-2: remote operation failed: Socket is not connected In socket.c, EOF is decided because ret is 0. ret may come from iov_load() or from readv(). I have not yet determined who is the culprit. On the brick side, I get this: [2013-04-17 21:07:47.208168] E [event-poll.c:346:event_dispatch_poll_handler] 0-poll: index not found for fd=8 (idx_hint=5) A tcpdump running at that time on brick side reports a TCP RST at 22:07:47.208163. I recall there glusterfs does not use local time, therefore I think it should be 21:07:47.208163 for glusterfs. There is also a small clock skew between client (offset -0.000732) and brick (-0.006740), which means brick is 6008 µs behind the client. That means the TCP reset happens after the ret = 0 in socket.c:487, as I understand. I therefore strongly suspect iov_load(). Opinions? Any hint? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@xxxxxxxxxx