I'm going to test both versions of the patch this weekend on my own pbx. I usually get the SEGV at least once a day. I just don't want to be on the phone when it happens. :)
On Fri, Oct 14, 2016 at 9:51 AM, Alexei Gradinari <alex2grad@xxxxxxxxx> wrote:
Hello Teluu,
What's wrong with my last patch?
Could you, please, review it?
I don't understand why you are so against these changes?
Regards,
Alexei
Friday, October 14, 2016, 4:48:14 AM, you wrote:
--
Hi Alexei,
We appreciate if you could test it out.
We will gladly review this again if your test show any issue.
Best Regards,
Riza
On Fri, Oct 14, 2016 at 12:01 AM, Alexei Gradinari <alex2grad@xxxxxxxxx> wrote:
Hello Riza,
1. You are right... the last_err was set out of locking the write_mutex.
Attached modified patch which set the last_err inside locking the write_mutex,
so it eliminates the SSL object reuse.
2. The read operation (SSL_read) is protected by write_mutex.
Are there other read operations which are not protected by write_mutex?
I concern write_mutex as SSL mutex which should protect the SSL objects.
3. About confirmation that the crash didn't happened with your patch...
If the crash hasn't happened in 2-3 days after you published this patch
it doesn't mean that the crash couldn't happen again.
If I see a code which could lead to the potential crash I'd like to write a code which prevent crashes.
Reusing the SSL object when it has encountered a fatal error could crash
on system with openssl version without
the commit 189e20c68c2399782034fec9f65d6b9ee88aac3d dated Apr 23, 2015.
If you insist I can setup a test system with openssl without this commit
to prove that crash could happen.
Regards,
Alexei
Thursday, October 13, 2016, 12:12:10 AM, you wrote:
Hi Alexei,
Please find the reply, inline.
Best Regards,
Riza
On Thu, Oct 13, 2016 at 12:23 AM, Alexei Gradinari <
alex2grad@xxxxxxxxx> wrote:
Hello Riza,
1. Please, see the next OpenSSL calls.
BIO_pending(ssock->ossl_wbio)
SSL_do_handshake(ssock->ossl_ssl)
SSL_read(ssock->ossl_ssl, data_, size_)
SSL_write(ssock->ossl_ssl, data, (int)size)
All of these calls could use "SSL object when it has encountered a fatal error" or
the SSL object was destroyed.
My patch added this checking, but you removed it.
Why?
- The checks will reduce the potential of of SSL object reuse, however it's not bullet proof.
e.g:
1. thread 1: fatal error occurs in on_read(), but before reset_ssl_sock_state() is called and ssock->last_err is updated, context switch
2. thread 2: ssock_send() executed, call BIO_pending() etc
- write_mutex actually is to protect write operation, as for the fatal error reported (from the call stack),
it happened on read operation which is not protected by write_mutex (write_mutex will be diacquire when destroying SSL state),
so, adding these checks around write_mutex would be less effective.
- based on our test, the crash didn't happen again. This was confirmed by George and another of our user with the same issue.
If you have a different result, please forward them to us.
- lastly (although minor), the fatal error scenario + reset_ssl_sock_state() happened on read operation but the SSL object being accessed was for write (e.g: write BIO).
However we are not certain if there's a correlation between the BIO write and the read operation.
2. The destroy_ssl doesn't destroy write_mutex.
The write_mutex is destroyed in pj_ssl_sock_close.
My patch does nothing with this, so this potential race existed before.
we were not saying that the issue was introduce by your patch.
The ssl should be destroyed as soon as possible to release a TCP socket/port.
To fix potential race issue with write_mutex the group locking could use.
So I combined my patch with your group locking patch.
Attached a new patch.
Best regards,
Alexei mailto:alex2grad@xxxxxxxxx
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
pjsip@xxxxxxxxxxxxxxx
http://lists.pjsip.org/mailman/listinfo/pjsip_lists. pjsip.org
George Joseph
Digium, Inc. | Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org
Digium, Inc. | Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org
_______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip@xxxxxxxxxxxxxxx http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org