Performance problems with OpenSSL and threading

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is for Apache Traffic Server and we have no knobs for turning on/off FIPS.  I am thinking about always disabling FIPS right now and that would happen before we create the threads. 

I was able to get rid of all the FIPS lock connection with the changes you recommend (Big Thanks!).  The big one now is type 1.  I am printing out the log every time the contention total is mod 1M.  Are there any tricks I can do for type 1 locks?

[Apr 30 22:46:49.549] Server {0x7f1e4531d700} ERROR: contention for lock - total contention: 4000000 waiting: 1 file: pmeth_lib.c line: 185 type: 10
[Apr 30 22:46:49.688] Server {0x7f1e45822700} ERROR: contention for lock - total contention: 11000000 waiting: 2 file: err.c line: 469 type: 1
[Apr 30 22:46:50.406] Server {0x7f1e45c26700} ERROR: contention for lock - total contention: 4000000 waiting: 0 file: ex_data.c line: 304 type: 2
[Apr 30 22:46:50.932] Server {0x7f1e45b25700} ERROR: contention for lock - total contention: 12000000 waiting: 5 file: err.c line: 446 type: 1
[Apr 30 22:46:52.001] Server {0x7f1e45721700} ERROR: contention for lock - total contention: 1000000 waiting: 0 file: rand_lib.c line: 212 type: 19

-Bryan




> On Apr 29, 2015, at 11:37 AM, John Foley <foleyj at cisco.com> wrote:
> 
> Correct.  Locks 39/40 are only useful while the POST is running.  Once the POST completes, the POST status never changes again unless the POST runs again.  The only way to run the POST is by invoking FIPS_mode_set(1).  But there should be no reason to invoke FIPS_mode_set(1) more than once unless you want to provide some sort of run-time FIPS mode toggle capability.  
> 
> You're asking in the context of Apache, right?  Apache just has a global FIPS on/off knob in the config.  If set, FIPS is enabled for the lifetime of the process and never disabled/re-enabled. You would invoke FIPS_mode_set(1) once prior to spinning up the worker threads.  Therefore, the POST would be finished prior to any multi-threaded processing.
> 
> 
> 
> On 04/29/2015 01:53 PM, Bryan Call wrote:
>> Can I safely assume that if I call FIPS_mode_set(0) and get a successful return value then I don?t need to lock when there are callbacks for type 39 and 40 locks (for OpenSSL 1.0.1 and 1.0.2)?
>> 
>> -Bryan
>> 
>> 
>> 
>> 
>>> On Apr 28, 2015, at 10:22 AM, John Foley <foleyj at cisco.com <mailto:foleyj at cisco.com>> wrote:
>>> 
>>> In the context of OpenSSL 1.0.1 or 1.0.2, this means FIPS_mode_set(1) has returned with a successful return value.  
>>> 
>>> In the future, this may change if/when OpenSSL does another FIPS validation.  CMVP has changed the implementation guidance that requires the POST to run automatically without user intervention.  Any future FIPS validations would be subject to the new rules.  Hence, the behavior of FIPS_mode_set() may change in the future if OpenSSL decides to pursue another validation.
>>> 
>>> 
>>> 
>>> On 04/28/2015 12:43 PM, Bryan Call wrote:
>>>> What do you mean by ?FIPS POST has completed??
>>>> 
>>>> -Bryan
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Apr 24, 2015, at 4:17 PM, John Foley <foleyj at cisco.com <mailto:foleyj at cisco.com>> wrote:
>>>>> 
>>>>> Some of the algorithms still invoke the FIPS flavor of the algorithm even when FIPS is disabled.  For example, this code is from EVP_DigestUpdate().
>>>>> 
>>>>> int EVP_DigestUpdate(EVP_MD_CTX *ctx, const void *data, size_t count)
>>>>> {
>>>>> #ifdef OPENSSL_FIPS
>>>>>     return FIPS_digestupdate(ctx, data, count);
>>>>> #else
>>>>>     return ctx->update(ctx, data, count);
>>>>> #endif
>>>>> }
>>>>> 
>>>>> In 1.0.1, locks 39 and 40 are:
>>>>> 
>>>>> # define CRYPTO_LOCK_FIPS                39
>>>>> # define CRYPTO_LOCK_FIPS2               40
>>>>> 
>>>>> You should try adding some logic to skip the lock in your lock handler when the lock ID is 39 or 40. Again, it should be safe to not lock on these two as long as the FIPS POST has completed.
>>>>> 
>>>>> 
>>>>> 
>>>>> On 04/24/2015 05:56 PM, Bryan Call wrote:
>>>>>> In my last email I ran the benchmark on Fedora 21 (big mistake).   Here are the results when running it back on the 28/56 core RHEL 6.5 server showing contention in a different place (fips).  Is there a reason it would be calling into the FIPS code to get a lock of FIPS is not enabled?
>>>>>> 
>>>>>> Contention log:
>>>>>> [Apr 24 21:35:09.731] Server {0x7f5529ccd700} ERROR: contention for lock - total contention: 70662 waiting: 16 file: fips.c line: 471 type: 40
>>>>>> [Apr 24 21:35:09.732] Server {0x7f55299ca700} ERROR: contention for lock - total contention: 71605 waiting: 2 file: fips.c line: 107 type: 39
>>>>>> 
>>>>>> 
>>>>>> Line in in fips that see the contention the most:
>>>>>> [bcall at bm77 trafficserver]$ grep fips.c diags.log | cut -f 19 -d ' ' | sort | uniq -c | sort -rn
>>>>>>  875379 471
>>>>>>  288338 107
>>>>>>  275472 127
>>>>>> 
>>>>>> 
>>>>>> Different lock types and their contention frequency (multiple benchmarks):
>>>>>>  875379 40  < - in fips.c
>>>>>>  563810 39  <- also in fips.c
>>>>>>  440518 1
>>>>>>   44145 18
>>>>>>   17447 2
>>>>>>   11356 10
>>>>>>    1377 19
>>>>>>     687 12
>>>>>>     532 25
>>>>>>     302 33
>>>>>>     239 9
>>>>>>      58 16
>>>>>>      40 21
>>>>>>      32 11
>>>>>>      19 3
>>>>>>       1 14
>>>>>> 
>>>>>> 
>>>>>> Code to diable FIPS and log:
>>>>>> #ifdef OPENSSL_FIPS
>>>>>>     int mode = FIPS_mode();
>>>>>>     Debug("ssl", "FIPS_mode: %d", mode);
>>>>>>     mode = FIPS_mode();
>>>>>>     if (mode) {
>>>>>>       FIPS_mode_set(0);
>>>>>>       Debug("ssl", "FIPS_mode: %d", mode);
>>>>>>     }
>>>>>> #endif
>>>>>> 
>>>>>> [Apr 24 21:43:45.860] Server {0x7f7628146800} DEBUG: (ssl) FIPS_mode: 0
>>>>>> 
>>>>>> -Bryan
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Apr 24, 2015, at 10:56 AM, John Foley <foleyj at cisco.com <mailto:foleyj at cisco.com>> wrote:
>>>>>>> 
>>>>>>> When you create the private key given to SSL_CTX, be sure to instantiate a private key for each thread as well.  If you share the same key across all threads, this can introduce contention.
>>>>>>> 
>>>>>>> Here are the type 1 & 2 locks:
>>>>>>> 
>>>>>>> # define CRYPTO_LOCK_ERR                 1
>>>>>>> # define CRYPTO_LOCK_EX_DATA             2
>>>>>>> 
>>>>>>> Have you checked the OpenSSL error log to see what errors are being generated?
>>>>>>> 
>>>>>>> Also, if you disable FIPS, does the thread contention still occur?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 04/24/2015 01:36 PM, Bryan Call wrote:
>>>>>>>> We are using a single SSL_CTX across all the threads, so I will create multiple SSL_CTX per thread.  I implemented dynlock callbacks yesterday, but I didn?t see them being used in the tests I did.  I also added atomic counters to see what type of locks are having contention (see logs below):
>>>>>>>> 
>>>>>>>> Most of the lock contention seems to be around type 2 and type 1 locks when calling our lock callback.  In the FIPS/POST and SSL_CTX recommendations, in your previous email, what types of locks would those help with?
>>>>>>>> 
>>>>>>>> Log output:
>>>>>>>> [Apr 24 10:21:24.339] Server {0xb310000} ERROR: <SSLUtils.cc <http://sslutils.cc/>:178 (SSL_locking_callback)> contention for lock - total contention: 3938 waiting: 1 file: /SourceCache/OpenSSL098/OpenSSL098-52.20.2/src/crypto/ex_data.c line: 496 type: 2
>>>>>>>> [Apr 24 10:21:24.754] Server {0xb10a000} ERROR: <SSLUtils.cc <http://sslutils.cc/>:178 (SSL_locking_callback)> contention for lock - total contention: 1350 waiting: 1 file: /SourceCache/OpenSSL098/OpenSSL098-52.20.2/src/crypto/err/err_def.c line: 343 type: 1
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -Bryan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Apr 23, 2015, at 4:46 PM, John Foley <foleyj at cisco.com <mailto:foleyj at cisco.com>> wrote:
>>>>>>>>> 
>>>>>>>>> Looking at your call stack, it appears you're testing with FIPS mode enabled.  There's a lock in FIPS_selftest_failed() that's used to ensure the crypto algorithms inside the FIPS boundary are not utilized until after the POST has completed and passed. Unfortunately this lock remains installed for the lifetime of the process.  We've seen thread contention in this area when using OpenSSL.  One work-around is to modify your handler that you pass to SSL_locking_callback() to not lock when the lock ID is CRYPTO_LOCK_FIPS.  But you would only want to disable this particular lock ID after the POST has completed.
>>>>>>>>> 
>>>>>>>>> If the above doesn't help, another item to check is whether you're instantiating an SSL_CTX for each thread.  I've seen some developers use a single SSL_CTX for multiple threads.  This results in a single instance of the server cert and private key being shared across all the threads.  This results in thread contention on the asymmetric key operations associated with the cert and private key (e.g. RSA).  Be sure that each thread instantiates it's                                                       own SSL_CTX and certificate/private key.
>>>>>>>>> 
>>>>>>>>> Lastly, add some debugs to count the locks by lock ID in your lock handler.  There are about 40 different lock IDs defined in crypto/crypto.h.  You'll need to identify which lock is causing the thread contention.  Once you know that, then you can look in the code to see where that lock is used and troubleshoot from there.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 04/23/2015 07:23 PM, Bryan Call wrote:
>>>>>>>>>> I was running a benchmark on a 28 core (56 hyper-threaded) server that is running 84 threads in the process and I am seeing a lot of lock contention.  I saw a lot of lock contention in the calls to SSL_get_error() ssl3_accept().  I am running RHEL 6.5 and openssl-1.0.1e-30.el6_6.7.x86_64.  We have used SSL_locking_callback() to setup pthread mutex locks, but haven?t setup dynlocks yet.  Would dynlocks help in this scenario?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Output from perf:
>>>>>>>>>> Samples: 1M of event 'cycles', Event count (approx.): 602783506611
>>>>>>>>>> - 48.04% [ET_NET 0] [kernel.kallsyms] [k] _spin_lock ?
>>>>>>>>>> - _spin_lock ?
>>>>>>>>>> + 48.05% futex_wait_setup ?
>>>>>>>>>> + 47.79% futex_wake ?
>>>>>>>>>> + 1.20% tcp_v4_rcv ?
>>>>>>>>>> + 0.98% task_rq_lock ?
>>>>>>>>>> + 0.64% double_lock_balance ?
>>>>>>>>>> + 3.94% [ET_NET 0] libcrypto.so.1.0.1e [.] bn_mul_mont ?
>>>>>>>>>> + 1.93% [ET_NET 0] libcrypto.so.1.0.1e [.] BN_usub ?
>>>>>>>>>> + 1.80% [ET_NET 0] libcrypto.so.1.0.1e [.] 0x00000000000e5200 ?
>>>>>>>>>> + 1.49% [ET_NET 0] libcrypto.so.1.0.1e [.] sha256_block_data_order ?
>>>>>>>>>> + 0.97% [ET_NET 0] libcrypto.so.1.0.1e [.] gcm_ghash_clmul ?
>>>>>>>>>> + 0.85% [ET_NET 0] libpthread-2.12.so [.] pthread_mutex_lock ?
>>>>>>>>>> + 0.84% [ET_NET 0] libcrypto.so.1.0.1e [.] aesni_ctr32_encrypt_blocks ?
>>>>>>>>>> + 0.74% [ET_NET 0] libc-2.12.so [.] malloc ?
>>>>>>>>>> + 0.72% [ET_NET 0] libc-2.12.so [.] memcpy ?
>>>>>>>>>> + 0.57% [ET_NET 0] libcrypto.so.1.0.1e [.] BN_mod_mul_montgomery ?
>>>>>>>>>> + 0.51% [ET_NET 0] libpthread-2.12.so [.] pthread_mutex_unlock
>>>>>>>>>> 
>>>>>>>>>> pstack of the process (only a couple of the interesting threads):
>>>>>>>>>> Thread 1 (Thread 0x7f1fba151800 (LWP 1106)):
>>>>>>>>>> #0 0x00007f1fbaaeb264 in __lll_lock_wait () from /lib64/libpthread.so.0
>>>>>>>>>> #1 0x00007f1fbaae6508 in _L_lock_854 () from /lib64/libpthread.so.0
>>>>>>>>>> #2 0x00007f1fbaae63d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
>>>>>>>>>> #3 0x000000000072822f in SSL_locking_callback(int, int, char const*, int) ()
>>>>>>>>>> #4 0x00000031d2ae7f03 in ?? () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #5 0x00000031d2ae7d8c in ?? () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #6 0x00000031d2ae80ab in ERR_get_state () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #7 0x00000031d2ae8557 in ?? () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #8 0x00000031d2e424a9 in SSL_get_error () from /usr/lib64/libssl.so.10
>>>>>>>>>> #9 0x00000000007249ef in SSLNetVConnection::load_buffer_and_write(long, long&, long&, MIOBufferAccessor&, int&) ()
>>>>>>>>>> #10 0x00000000007389df in write_to_net_io(NetHandler*, UnixNetVConnection*, EThread*) ()
>>>>>>>>>> #11 0x0000000000738523 in write_to_net(NetHandler*, UnixNetVConnection*, EThread*) ()
>>>>>>>>>> #12 0x0000000000731922 in NetHandler::mainNetEvent(int, Event*) ()
>>>>>>>>>> #13 0x00000000004f6d48 in Continuation::handleEvent(int, void*) ()
>>>>>>>>>> #14 0x000000000075877a in EThread::process_event(Event*, int) ()
>>>>>>>>>> #15 0x0000000000758c84 in EThread::ex
>>>>>>>>>> 
>>>>>>>>>> Thread 56 (Thread 0x7f1fb0ecd700 (LWP 1172)):
>>>>>>>>>> #0 0x00007f1fbaae77d9 in pthread_mutex_unlock () from /lib64/libpthread.so.0
>>>>>>>>>> #1 0x0000000000728265 in SSL_locking_callback(int, int, char const*, int) ()
>>>>>>>>>> #2 0x00000031d2b57402 in ?? () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #3 0x00000031d2b574bd in FIPS_selftest_failed () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #4 0x00000031d2ae9f1c in EVP_DigestInit_ex () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #5 0x00000031d2a7977e in HMAC_Init_ex () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #6 0x00000031d2a79c29 in ?? () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #7 0x00000031d2af7a42 in EVP_PKEY_CTX_ctrl () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #8 0x00000031d2ae9ffb in EVP_DigestInit_ex () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #9 0x00000031d2af952a in ?? () from /usr/lib64/libcrypto.so.10
>>>>>>>>>> #10 0x00000031d2e34cf2 in tls1_PRF () from /usr/lib64/libssl.so.10
>>>>>>>>>> #11 0x00000031d2e35796 in tls1_final_finish_mac () from /usr/lib64/libssl.so.10
>>>>>>>>>> #12 0x00000031d2e2db2e in ssl3_get_message () from /usr/lib64/libssl.so.10
>>>>>>>>>> #13 0x00000031d2e1dc3c in ssl3_get_cert_verify () from /usr/lib64/libssl.so.10
>>>>>>>>>> #14 0x00000031d2e220d6 in ssl3_accept () from /usr/lib64/libssl.so.10
>>>>>>>>>> #15 0x0000000000725287 in SSLNetVConnection::sslServerHandShakeEvent(int&) ()
>>>>>>>>>> #16 0x00000000007251b3 in SSLNetVConnection::sslStartHandShake(int, int&) ()
>>>>>>>>>> #17 0x0000000000723df3 in SSLNetVConnection::net_read_io(NetHandler*, EThread*) ()
>>>>>>>>>> #18 0x0000000000731852 in NetHandler::mainNetEvent(int, Event*) ()
>>>>>>>>>> #19 0x00000000004f6d48 in Continuation::handleEvent(int, void*) ()
>>>>>>>>>> #20 0x000000000075877a in EThread::process_event(Event*, int) ()
>>>>>>>>>> #21 0x0000000000758c84 in EThread::execute() ()
>>>>>>>>>> #22 0x0000000000757cf8 in spawn_thread_internal(void*) ()
>>>>>>>>>> #23 0x00007f1fbaae49d1 in start_thread () from /lib64/libpthread.so.0
>>>>>>>>>> #24 0x00000030ff0e88fd in clone () from /lib64/libc.so.6
>>>>>>>>>> 
>>>>>>>>>> -Bryan
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> openssl-users mailing list
>>>>>>>>>> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users <https://mta.openssl.org/mailman/listinfo/openssl-users>
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> openssl-users mailing list
>>>>>>>>> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users <https://mta.openssl.org/mailman/listinfo/openssl-users>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> openssl-users mailing list
>>>>>>>> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users <https://mta.openssl.org/mailman/listinfo/openssl-users>
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> openssl-users mailing list
>>>>>>> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users <https://mta.openssl.org/mailman/listinfo/openssl-users>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> openssl-users mailing list
>>>>>> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users <https://mta.openssl.org/mailman/listinfo/openssl-users>
>>>>> 
>>>>> _______________________________________________
>>>>> openssl-users mailing list
>>>>> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users <https://mta.openssl.org/mailman/listinfo/openssl-users>
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> openssl-users mailing list
>>>> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users <https://mta.openssl.org/mailman/listinfo/openssl-users>
>>> 
>>> _______________________________________________
>>> openssl-users mailing list
>>> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users <https://mta.openssl.org/mailman/listinfo/openssl-users>
>> 
>> 
>> 
>> _______________________________________________
>> openssl-users mailing list
>> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users <https://mta.openssl.org/mailman/listinfo/openssl-users>
> 
> _______________________________________________
> openssl-users mailing list
> To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mta.openssl.org/pipermail/openssl-users/attachments/20150430/62bd6e06/attachment-0001.html>


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux