Jeff/Community
Getting back to this
thread after a long time. We tried many things since this
initial issue: Moved to linux, tried latest
apache/apr/aprutils bins, tried adjusting the configuration,
etc. All this failed eventually in the same way: Multiple hung
threads eventually overloading the server.
In our current
environment we switched to pre-fork mpm thinking that maybe
threading was killing us. This seemed to work well until day
20 (which seems to be relevant as we got to day 20 a few
times). Today all 200 procs (Max Servers) were launched, not
one would die. All hung.
The root proc is in
this state:
$sudo
pstack 5362
#0 0x00000039892e1353 in __select_nocancel () from
/lib64/libc.so.6
#1 0x00007ffff7989025 in apr_sleep () from
/codeadm/http_servers/httpd-2.4.16-prefork/lib/libapr-1.so.0
#2 0x00000000004325ec in ap_wait_or_timeout ()
#3 0x0000000000469680 in prefork_run ()
#4 0x000000000043171e in ap_run_mpm ()
#5 0x000000000042b9e4 in main ()
Typical
pstack from a hung proc is
$ sudo
pstack 6100
#0 0x00007ffff7dd4955 in move_block () from
/codeadm/http_servers/httpd-2.4.16-prefork/lib/libaprutil-1.so.0
#1 0x00007ffff7dd50a1 in apr_rmm_calloc () from
/codeadm/http_servers/httpd-2.4.16-prefork/lib/libaprutil-1.so.0
#2 0x00007ffff5f26c66 in util_ald_strdup () from
/codeadm/http_servers/httpd/modules/mod_ldap.so
#3 0x00007ffff5f2628a in util_ldap_search_node_copy () from
/codeadm/http_servers/httpd/modules/mod_ldap.so
#4 0x00007ffff5f27235 in util_ald_cache_insert () from
/codeadm/http_servers/httpd/modules/mod_ldap.so
#5 0x00007ffff5f2352d in uldap_cache_checkuserid () from
/codeadm/http_servers/httpd/modules/mod_ldap.so
#6 0x00007ffff6b459ae in authn_ldap_check_password () from
/codeadm/http_servers/httpd/modules/mod_authnz_ldap.so
#7 0x00007ffff673ae4f in authenticate_basic_user () from
/codeadm/http_servers/httpd/modules/mod_auth_basic.so
#8 0x0000000000441c90 in ap_run_check_user_id ()
#9 0x00000000004451d2 in ap_process_request_internal ()
#10 0x00000000004627d8 in ap_process_async_request ()
#11 0x000000000046294f in ap_process_request ()
#12 0x000000000045ec9e in ap_process_http_connection ()
#13 0x00000000004567f0 in ap_run_process_connection ()
#14 0x000000000046900e in child_main ()
#15 0x0000000000469264 in make_child ()
#16 0x0000000000469d87 in prefork_run ()
#17 0x000000000043171e in ap_run_mpm ()
#18 0x000000000042b9e4 in main ()
[jacquet@llbdub0009 logs]$
Running on
Red Hat Enterprise Linux Server release 6.6 (Santiago) with
httpd-2.4.16-prefork.
Killing off
these hung procs only band-aides the situation. New procs also
hang (building up slowly now).
I am going
to have to do a full restart of the server.
My
expectation is that the server will be find again for another
20 days.
Grasping at
straws now. Any thoughts on this? Anything to try?
Thanks
Mj