> The more relevant question would be with TCP_KEEPALIVE and TCP_USER_TIMEOUT > on sockets, do we really need ping-pong framework in Clients? We might need > that in transport/rdma setups, but my question is concentrating on > transport/rdma. In other words would like to hear why do we need heart-beat > mechanism in the first place. One scenario might be a healthy socket level > connection but an unhealthy brick/client (like a deadlocked one). This is an important case to consider. On the one hand, I think it answers your question about TCP_KEEPALIVE. What we really care about is whether a brick's request queue is moving. In other words, what's the time since the last reply from that brick, and does that time exceed some threshold? On a busy system, we don't even need ping packets to know that. We can just use responses on other requests to set/reset that timer. We only need to send ping packets when our *outbound* queue has remained empty for some fraction of our timeout. However, it's important that our measurements be *end to end* and not just at the transport level. This is particularly true with multiplexing, where multiple bricks will share and contend on various resources. We should ping *through* client and server, with separate translators above and below each. This would give us a true end-to-end ping *for that brick*, and also keep the code nicely modular. _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel