Re: SSL_read() returning SSL_ERROR_SYSCALL with errno 11EAGAIN

Matt Caswell <matt@xxxxxxxxxxx> · Wed, 1 May 2019 08:42:06 +0100

On 30/04/2019 23:37, Viktor Dukhovni wrote:
> On Tue, Apr 30, 2019 at 03:23:23PM -0700, Erik Forsberg wrote:
> 
>>> Is the handshake explicit, or does the application just call
>>> SSL_read(), with OpenSSL performing the handshake as needed?
>>
>> I occasionally (somewhat rarely) see the issue mentioned by the OP.
>> Ignoring the error, or mapping it and do what WANT_READ/WANT_WRITE
>> does effectively hides the issue and connection works fine. I predominantly
>> run on Solaris 11. In my case, I open the socket myself, set non-blocking
>> mode and associates with an SSL object using SS_set_fd().
>> The initial handshake is done explicitly.
> 
> Recoverable errors should not result in SSL_ERROR_SYSCALL.  This
> feels like a bug.  I'd like to hear from Matt Caswell on this one.
> Perhaps someone should open an issue on Github...
> 

SSL_ERROR_SYSCALL should not be raised as result of a recoverable error. This
should always be considered fatal. If you are getting this but errno says EAGAIN
then a number of possibilities spring to mind:

1) If a fatal error has occurred SSL_get_error() checks to see if there is an
error on the OpenSSL error queue. If there is it returns SSL_ERROR_SSL (unless
the error type is ERR_LIB_SYS). If there is no error at all, but libssl doesn't
think the error is recoverable then it will return SSL_ERROR_SYSCALL by default.
It is possible that libssl has encountered some non-syscall related error but
neglected to push an error onto the error queue. Thus the return value
incorrectly indicates SSL_ERROR_SYSCALL when it should have been SSL_ERROR_SSL.
This would be an OpenSSL bug - but quite tricky to find since we'd have to
locate the spot where no error is being pushed...but because there is no error
we don't have a lot to go on!

2) A second possibility is that it really was a syscall that failed but
something (either in libssl or possibly in application code) made some
subsequent syscall that changed errno in the meantime. If that "something" was
in libssl then that's probably also a libssl bug. (Also quite tricky to track down)

3) A third possibility is that it really is a retryable error but libssl failed
to properly set its state to note that. I think this is quite a lot less likely
than (1) or (2) but would also be a libssl bug.

So my guess is, except in the case where the application itself has accidentally
changed errno, this most likely indicates an openssl bug. The safest thing to do
in such circumstances is to treat this as a fatal error. It is very unwise to
retry a connection where the library has indicated a fatal error (e.g. see
CVE-2019-1559)

What OpenSSL version is this?

Matt