- We have a server that has around 2025 clients connected at any instant.
- Our application creates a Server /Listener socket that then is converted into a Secure socket using OpenSSL library. This is compiled and built in a Windows x64 environment. We also
built the OpenSSL for the Windows. The Listener socket is created with a default backlog of 500. The Accept socket is non-blocking socket and waits for connections
- Every Client makes a regular blocking connection to the Server. The Server accepts the connection after which the Client socket is converted to a secure socket using the OpenSSL Library.
- The connections are coming at a rate of about 10 connections /second ? Not sure about this number.
- We are able to connect to all the clients in a few minutes and it stays like that for some time. There constant exchange of messages between Server(COS) and clients without issues.
- The application logic is to keep trying to connect every timeout.
- After maybe a few hours/days we see the clients dropping connections. The logs indicate the SSL_Read or SSL_Write on the Server fails for a client with SSL_Error number 5 (SSL_ERROR_SYSCALL)
and the equivalent Windows error of WSATimeOut. We then observe the WSAECONNRESET as the Client closed connection. We see this behavior for multiple sites.
- The number of Clients disconnected starts increasing and we see the logs in the Client where the server refuses any more connections form Clients (10061- WSAECONNREFUSED) There is
nothing to indicate this state in the server logs. Our theory is the backlog is filled and Server refusing further connections.
- We are trying to find why we get the SSL_Read/SSL_Write Error as it a Blocking socket. We cannot use to a non-blocking socket due to platform and application limitation
|