I am seeing something very odd on our Apache 2.4.12 server (SunOS myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200)
We are using MPM Worker.
I have been watching the scoreboard all day monitoring system load and running processes/threads.
Around 10AM the load jumped to from a normal < 1 to >7 then made it's way up to >20 where it has sat all day with 21 threads in status "W"
I traced the threads back to the actual users here at work and asked them what they did, etc. No help there other than they both rapidly made requests to the server (one "restored" a browser session, the other rapidly clicked some URLs in a Word doc). One user even rebooted for me (no effect on Apache)
In any case I have 21 threads in "W" state.
The server has even gone on and created new process leaving these procs behind open with one or more thread active. But the load will not drop!
Pstack of a hung process, this one only has one hung thread, looks like this:
3260: /codeadm/http_servers/httpd/bin/httpd -f /codeadm/http_servers/httpd/c
----------------- lwp# 1 / thread# 1 --------------------
ff041714 lwp_wait (10, ffbff2ec)
ff03d11c _thrp_join (10, 0, ffbff354, 1, ffbff2ec, ff06cbc0) + 34
ff24fd08 apr_thread_join (ffbff3d4, 1ef320, ff06cbc0, 0, 0, ff3a2000) + 48
000d4490 join_workers (1ef4a0, 1f4a88, 1, 1eef00, 1eee50, 1883d0) + 2f8
000d4e80 child_main (2, d1988, ff06cbc0, 0, 0, ff3a2000) + 7f8
000d50a8 make_child (1883d0, 2, 134518, 7, 0, 1883d0) + 1b0
000d5cb0 perform_idle_server_maintenance (ffbff69c, ffbff698, ffbff684, 163188, 1883d0, ff3a0140) + a28
000d6300 server_main_loop (0, 0, 134518, 7, 0, 1883d0) + 548
000d67e8 worker_run (134518, 18a470, 1883d0, 150000, ff3a0100, ff3a0140) + 490
0005dd28 ap_run_mpm (163188, 18a470, 1883d0, 1883d0, 0, 0) + a8
0004e0e0 main (5, ffbff8cc, ffbff8e4, 150000, ff3a0100, ff3a0140) + 17b0
0004b3b4 _start (0, 0, 0, 0, 0, 0) + dc
----------------- lwp# 16 / thread# 16 --------------------
ff31dcc4 find_block_by_offset (19c550, 10, d778, 1, 0, 314628) + 8c
ff31e218 move_block (19c550, d778, 0, 0, 2, 0) + 228
ff31f44c apr_rmm_calloc (19c550, 18, fe8e4af8, c, 0, 314628) + 1fc
fe8e07bc util_ald_alloc (fe580670, 18, 0, 0, 2, 0) + 7c
fe8e1f20 util_ald_cache_insert (fe580670, fd0f9898, fe8e4af8, c, 0, 314628) + 170
fe8d9d2c uldap_cache_checkuserid (fe8e4af8, 0, 0, 0, 2, 0) + 1044
fe9e3f74 authn_ldap_check_password (0, fd0f99ac, 31609f, fd0f9998, 80808080, 1010101) + 834
fe982470 authenticate_basic_user (314628, 0, 3145e8, 8d, 237120, 25aec0) + 608
0007f750 ap_run_check_user_id (314628, 236e78, 236e78, 2, d, 25aec0) + 90
000818fc ap_process_request_internal (314628, 0, 3145e8, 8d, 237120, 25aec0) + 6e4
000c5288 ap_process_async_request (314628, 236e78, 236e78, 2, d, 25aec0) + 638
000c5428 ap_process_request (314628, 4, 314628, 8d, 237120, 25aec0) + 20
000bddc0 ap_process_http_sync_connection (237128, 236e78, 236e78, 2, d, 25aec0) + f0
000bdfbc ap_process_http_connection (237128, 236e78, 236e78, 8d, 237120, 25aec0) + 64
000ab038 ap_run_process_connection (237128, 236e78, 236e78, 2, d, 25aec0) + 90
000ab9bc ap_process_connection (237128, 236e78, 236e78, 8d, 237120, 25aec0) + 8c
000d235c process_socket (1ef320, 236e30, 236e78, 2, d, 25aec0) + ec
000d373c worker_thread (1ef320, 1f6ef0, 0, 0, 0, 0) + 49c
ff24f894 dummy_worker (1ef320, fd0fc000, 0, 0, ff24f840, 1) + 54
ff0404f4 _lwp_start (0, 0, 0, 0, 0, 0)
----------------- lwp# 17 / thread# 17 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 18 / thread# 18 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 19 / thread# 19 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 20 / thread# 20 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 21 / thread# 21 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 22 / thread# 22 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 23 / thread# 23 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 24 / thread# 24 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 25 / thread# 25 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 26 / thread# 26 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 27 / thread# 27 --------------------
ff24f840 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
newyahoo%
Partial ScoreBoard looks like:
Server Version: Apache/2.4.12 (Unix)
Server MPM: worker
Server Built: Jun 3 2015 17:19:20
Current Time: Tuesday, 16-Jun-2015 15:01:45 PDT
Restart Time: Monday, 08-Jun-2015 14:30:49 PDT
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 8 days 30 minutes 55 seconds
Server load: 23.09 22.46 21.88
Total accesses: 68346 - Total Traffic: 10.0 GB
CPU Usage: u97541.5 s126.35 cu787.35 cs139.55 - 14.2% CPU load
.0986 requests/sec - 15.1 kB/second - 152.7 kB/request
6 requests currently being processed, 94 idle workers
_____________WW_____W__W_____________W_____W______.............W
....................W..W.W.W...W...W..........W.......WW..W.....
..........W...W..W.W..__________________________________________
________
Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process
Net stat shows some hung connections in "CLOSE_WAIT" state for one of the hosts (but not all) that have hung thread/connections:
newyahoo% netstat | grep clienthostname
newyahoo.WWW clienthostname.62580 65142 0 49896 0 CLOSE_WAIT
newyahoo.WWW clienthostname.62579 65142 0 49896 0 CLOSE_WAIT
newyahoo.WWW clienthostname.62582 65142 0 49896 0 CLOSE_WAIT
newyahoo.WWW clienthostname.62591 65142 0 49896 0 CLOSE_WAIT
Can anyone assist in debugging this?
I would love to have these threads exist without having to manually restart the server.
Thanks
MJ