So following your advice I was able to get some stack traces while the server was hanging/slow to respond. This is from one of our search hosts.
I have shortened it here considerably because we do have customer data that is present, I can do some more scrubbing later if it will help.Seems to me to be revolved around indexes, I know we increased our allidslimit pretty high to 500000, I'm wondering if that has anything to do with it.
Thread 3 (Thread 0x2aef51f20940 (LWP 2569)):
#0 0x000000328800b019 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00002aeeae1ba4f6 in __db_pthread_mutex_lock () from /lib64/libdb-4.3.so
No symbol table info available.
#2 0x00002aeeae242619 in __lock_get_internal () from /lib64/libdb-4.3.so
No symbol table info available.
#3 0x00002aeeae242b7f in __lock_vec () from /lib64/libdb-4.3.so
No symbol table info available.
#4 0x00002aeeae222d30 in __db_lget () from /lib64/libdb-4.3.so
No symbol table info available.
#5 0x00002aeeae1cac72 in __bam_search () from /lib64/libdb-4.3.so
No symbol table info available.
#6 0x00002aeeae1bd8d7 in ?? () from /lib64/libdb-4.3.so
No symbol table info available.
#7 0x00002aeeae1bea4f in ?? () from /lib64/libdb-4.3.so
No symbol table info available.
#8 0x00002aeeae218829 in __db_c_get () from /lib64/libdb-4.3.so
No symbol table info available.
#9 0x00002aeeadf289ed in idl_new_fetch (be=0x1dd03130, db=<value optimized out>, inkey=0x2aef51f10760, txn=<value optimized out>, a=0x1dd44940, flag_err=0x2aef51f175bc, allidslimit=500000) at ldap/servers/slapd/back-ldbm/idl_new.c:223
Thread 8 (Thread 0x2aef4ed1b940 (LWP 2564)):
#0 0x000000328800e5c8 in pread64 () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00002aeeae25c5dd in __os_io () from /lib64/libdb-4.3.so
No symbol table info available.
#2 0x00002aeeae25168b in __memp_pgread () from /lib64/libdb-4.3.so
No symbol table info available.
#3 0x00002aeeae2527dd in __memp_fget () from /lib64/libdb-4.3.so
No symbol table info available.
#4 0x00002aeeae1ca938 in __bam_search () from /lib64/libdb-4.3.so
No symbol table info available.
#5 0x00002aeeae1bd8d7 in ?? () from /lib64/libdb-4.3.so
No symbol table info available.
#6 0x00002aeeae1bea4f in ?? () from /lib64/libdb-4.3.so
No symbol table info available.
#7 0x00002aeeae218829 in __db_c_get () from /lib64/libdb-4.3.so
No symbol table info available.
#8 0x00002aeeae220fe6 in __db_get () from /lib64/libdb-4.3.so
No symbol table info available.
#9 0x00002aeeae22115a in __db_get_pp () from /lib64/libdb-4.3.so
No symbol table info available.
#10 0x00002aeeadf24266 in id2entry (be=0x1dd03130, id=7630577, txn=0x2aef4ed104e0, err=0x2aef4ed10544) at ldap/servers/slapd/back-ldbm/id2entry.c:315
inst = (ldbm_instance *) 0x1dc8d180
db = (DB *) 0x1dd01080
db_txn = (DB_TXN *) 0x0
key = {data = "" size = 4, ulen = 0, dlen = 0, doff = 0, flags = 0}
data = "" = 0x0, size = 0, ulen = 0, dlen = 0, doff = 0, flags = 4}
e = (struct backentry *) 0x0
ee = <value optimized out>
temp_id = "\000tnñ"
And another locked worker thread:
#0 0x000000328800d654 in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x0000003288008f4a in _L_lock_1034 () from /lib64/libpthread.so.0
No symbol table info available.
#2 0x0000003288008e0c in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3 0x00002aeeae1ba54c in __db_pthread_mutex_lock () from /lib64/libdb-4.3.so
No symbol table info available.
#4 0x00002aeeae252a51 in __memp_fget () from /lib64/libdb-4.3.so
No symbol table info available.
#5 0x00002aeeae218d73 in __db_c_get () from /lib64/libdb-4.3.so
No symbol table info available.
#6 0x00002aeeadf28b63 in idl_new_fetch (be=0x1dd03130, db=<value optimized out>, inkey=0x735755, txn=<value optimized out>, a=0x1dd421f0, flag_err=0x2aef4e3115bc, allidslimit=500000) at ldap/servers/slapd/back-ldbm/idl_new.c:298
And the replication thread appears to be locked as well:#0 0x000000328800d654 in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x0000003288008f4a in _L_lock_1034 () from /lib64/libpthread.so.0
No symbol table info available.
#2 0x0000003288008e0c in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3 0x00002aeeae1ba54c in __db_pthread_mutex_lock () from /lib64/libdb-4.3.so
No symbol table info available.
#4 0x00002aeeae252a51 in __memp_fget () from /lib64/libdb-4.3.so
No symbol table info available.
#5 0x00002aeeae218d73 in __db_c_get () from /lib64/libdb-4.3.so
No symbol table info available.
#6 0x00002aeeadf28b63 in idl_new_fetch (be=0x1dd03130, db=<value optimized out>, inkey=0x735755, txn=<value optimized out>, a=0x1dd421f0, flag_err=0x2aef4e3115bc, allidslimit=500000) at ldap/servers/slapd/back-ldbm/idl_new.c:298
#0 0x000000328800d654 in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x0000003288008f80 in _L_lock_1233 () from /lib64/libpthread.so.0
No symbol table info available.
#2 0x0000003288008f03 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3 0x000000328ac23289 in PR_Lock () from /usr/lib64/libnspr4.so
No symbol table info available.
#4 0x000000328ac234cb in PR_EnterMonitor () from /usr/lib64/libnspr4.so
No symbol table info available.
#5 0x00002aeeadf1496c in cache_lock_entry (cache=0x1dc8d208, e=0x2af02d468c00) at ldap/servers/slapd/back-ldbm/cache.c:1455
No locals.
#6 0x00002aeeadf23b31 in find_entry_internal (pb=0x2af022054ca0, be=0x1dd03130, addr=<value optimized out>, lock=1, txn=0x2aef3ddf9cb0, flags=0) at ldap/servers/slapd/back-ldbm/findentry.c:237
No locals.
#7 0x00002aeeadf4df1a in ldbm_back_modify (pb=0x2af022054ca0) at ldap/servers/slapd/back-ldbm/ldbm_modify.c:269
On Wed, Aug 21, 2013 at 9:14 AM, Rich Megginson <rmeggins@xxxxxxxxxx> wrote:
On 08/21/2013 09:53 AM, David Boreham wrote:gdb will give much more detail
Another thing you might try :
While the server is under stress, run the "pstack" command a few times and save the output.
http://port389.org/wiki/FAQ#Debugging_Hangs
If you post the thread stacks here, someone familiar with the code can say with more accuracy what's going on. For example it will be obvious whether you have starved out the thread pool, or you have threads mostly waiting on page locks in the DB, etc.
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users
-- 389 users mailing list 389-users@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/389-users