Yesterday during an ffsb run on the ceph kernel client, both the client and the osd processes hit the max open fds limit (there was only one osd up at the time). There were 1006 sockets in the CLOSING state on the client, and 1006 in the FIN_WAIT2 state on the osd. From the tcp state machine [1], it seems that the sequence of events was something like this, with both sides initially in the ESTABLISHED state: Kernel Client OSD | | | /| Send FIN, go to FIN_WAIT1 Send FIN, |\ / | go to | \ / | FIN_WAIT1 | \ / | | \ / | Recv FIN |<-------------- | | \ | Send ACK, |------\------------>| Recv ACK, go to FIN_WAIT2 go to | \ | CLOSING | -----------x| FIN not read That is, after closing its half of the connection, the osd isn't reading anything from the socket anymore, and thus ignores the FIN from the client. We have bug #1803 to track this, but we should make sure libceph in the kernel handles simultaneous TCP connection close correctly as well. [1] http://www.tcpipguide.com/free/diagrams/tcpfsm.png -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html