I agree that the group lock patch fixes the original issue
and I can confirm that with this patch there isn't a crash.
But additional check adds extra avoidance if something goes wrong.
if (ssock->ssl_state == SSL_STATE_NULL || ssock->last_err == SSL_ERROR_SYSCALL || !ssock->ossl_wbio)
if (ssock->ssl_state == SSL_STATE_NULL || ssock->last_err == SSL_ERROR_SYSCALL || !ssock->ossl_ssl)
May be add this check if PJ_ENABLE_EXTRA_CHECK.
About 'fatal error'.
Looking at opessnl s_client.c/s_server.c I found that the next errors brought to SSL_shutdown
SSL_ERROR_ZERO_RETURN
SSL_ERROR_SYSCALL
SSL_ERROR_SSL
So need to check all of these errors.
I couldn't reproduce the crash on old openssl version because I could only trigger SSL_ERROR_SYSCALL error,
but I think the crash could only happen on SSL_ERROR_ZERO_RETURN.
What do you think?
Regards,
Alexei
Monday, October 17, 2016, 8:55:17 AM, you wrote:
Hi Alexei, Please don't misunderstand, sorry for not being so informative about our hesitation to include some parts of the patch. The main and original issue is the crash, which from the call stack trace it seems to be caused by two threads accessing write BIO at the same time (one for shutting down connection, and the other for sending some app data). The issue in above scenario should have been addressed by https://trac.pjsip.org/repos/changeset/5459 via group lock, i.e: SSL shutdown only occurs after app finishes with the SSL socket by explicitly invoking pj_ssl_sock_close(), instead of after an SSL error in read operation whose timing is not controllable by application, so now both SSL shutdown and sending app data timings are controlled by application. We prefer group lock solution because it could also solve some other issues such as possibility of write_mutex is prematurely destroyed by application from on_data_read() callback while other thread is sending data. And then you brought up the issue of "reusing SSL object after a fatal error", this is a new/different issue to us as it is not the cause of the crash in the original report. The base of the issue is this commit message: -- Reusing an SSL object when it has encountered a fatal error can have bad consequences. This is a bug in application code not libssl but libssl should be more forgiving and not crash. -- which unfortunately doesn't really provide a clear definition of 'fatal error', we've tried to search around but unfortunatly cannot find much about it. The latest patch seems to handle SSL_ERROR_SYSCALL case well (even we are not sure whether it is a fatal error and causes crash), but how about other fatal errors. So for now, perhaps it is better to do some more tests first, especially with older OpenSSL version, i.e: before the Apr 23, 2015 commit. BR, nanang On Fri, Oct 14, 2016 at 10:51 PM, Alexei Gradinari <alex2grad@xxxxxxxxx> wrote: Hello Teluu, What's wrong with my last patch? Could you, please, review it? I don't understand why you are so against these changes? Regards, Alexei Friday, October 14, 2016, 4:48:14 AM, you wrote:
-- Best regards, Alexei mailto:alex2grad@xxxxxxxxx -- |
_______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org