Hung thread

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am seeing something very odd on our Apache 2.4.12 server  (SunOS myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200)
We are using MPM Worker.

I have been watching the scoreboard all day monitoring system load and running processes/threads.
Around 10AM the load jumped to from a normal < 1 to >7 then made it's way up to >20 where it has sat all day with 21 threads in status "W"
I traced the threads back to the actual users here at work and asked them what they did, etc. No help there other than they both rapidly made requests to the server (one "restored" a browser session, the other rapidly clicked some URLs in a Word doc). One user even rebooted for me (no effect on Apache)

In any case I have 21 threads in "W" state.

The server has even gone on and created new process leaving these procs behind open with one or more thread active. But the load will not drop!

Pstack of a hung process, this one only has one hung thread, looks like this:


3260:   /codeadm/http_servers/httpd/bin/httpd -f /codeadm/http_servers/httpd/c
-----------------  lwp# 1 / thread# 1  --------------------
 ff041714 lwp_wait (10, ffbff2ec)
 ff03d11c _thrp_join (10, 0, ffbff354, 1, ffbff2ec, ff06cbc0) + 34
 ff24fd08 apr_thread_join (ffbff3d4, 1ef320, ff06cbc0, 0, 0, ff3a2000) + 48
 000d4490 join_workers (1ef4a0, 1f4a88, 1, 1eef00, 1eee50, 1883d0) + 2f8
 000d4e80 child_main (2, d1988, ff06cbc0, 0, 0, ff3a2000) + 7f8
 000d50a8 make_child (1883d0, 2, 134518, 7, 0, 1883d0) + 1b0
 000d5cb0 perform_idle_server_maintenance (ffbff69c, ffbff698, ffbff684, 163188, 1883d0, ff3a0140) + a28
 000d6300 server_main_loop (0, 0, 134518, 7, 0, 1883d0) + 548
 000d67e8 worker_run (134518, 18a470, 1883d0, 150000, ff3a0100, ff3a0140) + 490
 0005dd28 ap_run_mpm (163188, 18a470, 1883d0, 1883d0, 0, 0) + a8
 0004e0e0 main     (5, ffbff8cc, ffbff8e4, 150000, ff3a0100, ff3a0140) + 17b0
 0004b3b4 _start   (0, 0, 0, 0, 0, 0) + dc
-----------------  lwp# 16 / thread# 16  --------------------
 ff31dcc4 find_block_by_offset (19c550, 10, d778, 1, 0, 314628) + 8c
 ff31e218 move_block (19c550, d778, 0, 0, 2, 0) + 228
 ff31f44c apr_rmm_calloc (19c550, 18, fe8e4af8, c, 0, 314628) + 1fc
 fe8e07bc util_ald_alloc (fe580670, 18, 0, 0, 2, 0) + 7c
 fe8e1f20 util_ald_cache_insert (fe580670, fd0f9898, fe8e4af8, c, 0, 314628) + 170
 fe8d9d2c uldap_cache_checkuserid (fe8e4af8, 0, 0, 0, 2, 0) + 1044
 fe9e3f74 authn_ldap_check_password (0, fd0f99ac, 31609f, fd0f9998, 80808080, 1010101) + 834
 fe982470 authenticate_basic_user (314628, 0, 3145e8, 8d, 237120, 25aec0) + 608
 0007f750 ap_run_check_user_id (314628, 236e78, 236e78, 2, d, 25aec0) + 90
 000818fc ap_process_request_internal (314628, 0, 3145e8, 8d, 237120, 25aec0) + 6e4
 000c5288 ap_process_async_request (314628, 236e78, 236e78, 2, d, 25aec0) + 638
 000c5428 ap_process_request (314628, 4, 314628, 8d, 237120, 25aec0) + 20
 000bddc0 ap_process_http_sync_connection (237128, 236e78, 236e78, 2, d, 25aec0) + f0
 000bdfbc ap_process_http_connection (237128, 236e78, 236e78, 8d, 237120, 25aec0) + 64
 000ab038 ap_run_process_connection (237128, 236e78, 236e78, 2, d, 25aec0) + 90
 000ab9bc ap_process_connection (237128, 236e78, 236e78, 8d, 237120, 25aec0) + 8c
 000d235c process_socket (1ef320, 236e30, 236e78, 2, d, 25aec0) + ec
 000d373c worker_thread (1ef320, 1f6ef0, 0, 0, 0, 0) + 49c
 ff24f894 dummy_worker (1ef320, fd0fc000, 0, 0, ff24f840, 1) + 54
 ff0404f4 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 17 / thread# 17  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 18 / thread# 18  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 19 / thread# 19  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 20 / thread# 20  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 21 / thread# 21  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 22 / thread# 22  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 23 / thread# 23  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 24 / thread# 24  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 25 / thread# 25  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 26 / thread# 26  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
-----------------  lwp# 27 / thread# 27  --------------------
 ff24f840 dummy_worker(), exit value = 0x00000000
        ** zombie (exited, not detached, not yet joined) **
newyahoo% 



Partial ScoreBoard looks like:

Server Version: Apache/2.4.12 (Unix)
Server MPM: worker
Server Built: Jun 3 2015 17:19:20

Current Time: Tuesday, 16-Jun-2015 15:01:45 PDT
Restart Time: Monday, 08-Jun-2015 14:30:49 PDT
Parent Server Config. Generation: 1
Parent Server MPM Generation: 0
Server uptime: 8 days 30 minutes 55 seconds
Server load: 23.09 22.46 21.88
Total accesses: 68346 - Total Traffic: 10.0 GB
CPU Usage: u97541.5 s126.35 cu787.35 cs139.55 - 14.2% CPU load
.0986 requests/sec - 15.1 kB/second - 152.7 kB/request
6 requests currently being processed, 94 idle workers

_____________WW_____W__W_____________W_____W______.............W
....................W..W.W.W...W...W..........W.......WW..W.....
..........W...W..W.W..__________________________________________
________

Scoreboard Key:
"_" Waiting for Connection, "S" Starting up, "R" Reading Request,
"W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
"C" Closing connection, "L" Logging, "G" Gracefully finishing,
"I" Idle cleanup of worker, "." Open slot with no current process


Net stat shows some hung connections in "CLOSE_WAIT" state for one of the hosts (but not all) that have hung thread/connections:

newyahoo% netstat | grep clienthostname
newyahoo.WWW         clienthostname.62580 65142      0 49896      0 CLOSE_WAIT
newyahoo.WWW         clienthostname.62579 65142      0 49896      0 CLOSE_WAIT
newyahoo.WWW         clienthostname.62582 65142      0 49896      0 CLOSE_WAIT
newyahoo.WWW         clienthostname.62591 65142      0 49896      0 CLOSE_WAIT


Can anyone assist in debugging this?

I would love to have these threads exist without having to manually restart the server.

Thanks
MJ





[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux