(Top posting to match what Mr. André does):
TCP without keepalive will time out the connection a few minutes after
sending any data that doesn't get a response.
TCP without keepalive with no outstanding send (so only a blocking
recv) and nothing outstanding at the other end will probably hang
almost forever as there is nothing indicating that there is actual
data lost in transit.
On 2020-11-13 17:13, Brice André wrote:
Hello,
And many thanks for the answer.
"Does the server parent process close its copy of the conversation
socket?" : I checked in my code, but it seems that no. Is it needed ?
May it explain my problem ?
" Do you have keepalives enabled?" To be honest, I did not know it was
possible to not enable them. I checked with command "netstat -tnope"
and it tells me that it is not enabled.
I suppose that, if for some reason, the communication with the client
is lost (crash of client, loss of network, etc.) and keepalive is not
enabled, this may fully explain my problem ?
If yes, do you have an idea of why keepalive is not enabled ? I
thought that by default on linux it was ?
Many thanks,
Brice
Le ven. 13 nov. 2020 à 15:43, Michael Wojcik
<Michael.Wojcik@xxxxxxxxxxxxxx <mailto:Michael.Wojcik@xxxxxxxxxxxxxx>>
a écrit :
> From: openssl-users <openssl-users-bounces@xxxxxxxxxxx
<mailto:openssl-users-bounces@xxxxxxxxxxx>> On Behalf Of Brice André
> Sent: Friday, 13 November, 2020 05:06
> ... it seems that in some rare execution cases, the server
performs a SSL_read,
> the client disconnects in the meantime, and the server never
detects the
> disconnection and remains stuck in the SSL_read operation.
...
> #0 0x00007f836575d210 in __read_nocancel () from
/lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00007f8365c8ccec in ?? () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
> #2 0x00007f8365c8772b in BIO_read () from
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
So OpenSSL is in a blocking read of the socket descriptor.
> tcp 0 0 http://5.196.111.132:5413
<http://5.196.111.132:5413> http://85.27.92.8:25856
<http://85.27.92.8:25856> ESTABLISHED 19218/./MabeeServer
> tcp 0 0 http://5.196.111.132:5412
<http://5.196.111.132:5412> http://85.27.92.8:26305
<http://85.27.92.8:26305> ESTABLISHED 19218/./MabeeServer
> From this log, I can see that I have two established connections
with remote
> client machine on IP 109.133.193.70. Note that it's normal to
have two connexions
> because my client-server protocol relies on two distinct TCP
connexions.
So the client has not, in fact, disconnected.
When a system closes one end of a TCP connection, the stack will
send a TCP packet
with either the FIN or the RST flag set. (Which one you get
depends on whether the
stack on the closing side was holding data for the conversation
which the application
hadn't read.)
The sockets are still in ESTABLISHED state; therefore, no FIN or
RST has been
received by the local stack.
There are various possibilities:
- The client system has not in fact closed its end of the
conversation. Sometimes
this happens for reasons that aren't immediately apparent; for
example, if the
client forked and allowed the descriptor for the conversation
socket to be inherited
by the child, and the child still has it open.
- The client system shut down suddenly (crashed) and so couldn't
send the FIN/RST.
- There was a failure in network connectivity between the two
systems, and consequently
the FIN/RST couldn't be received by the local system.
- The connection is in a state where the peer can't send the
FIN/RST, for example
because the local side's receive window is zero. That shouldn't be
the case, since
OpenSSL is (apparently) blocked in a receive on the connection.
but as I don't have
the complete picture I can't rule it out.
> This let me think that the connexion on which the SSL_read is
listening is
> definitively dead (no more TCP keepalive)
"definitely dead" doesn't have any meaning in TCP. That's not one
of the TCP states,
or part of the other TCP or IP metadata associated with the local
port (which is
what matters).
Do you have keepalives enabled?
> and that, for a reason I do not understand, the SSL_read keeps
blocked into it.
The reason is simple: The connection is still established, but
there's no data to
receive. The question isn't why SSL_read is blocking; it's why you
think the
connection is gone, but the stack thinks otherwise.
> Note that the normal behavior of my application is : client
connects, server
> daemon forks a new instance,
Does the server parent process close its copy of the conversation
socket?
Enjoy
Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded