We are seeing a very intermittent problem where a request hangs in the server for exact multiples of 30 secs ( we have seen hangs of 30 – 180 secs). This usually involves 2 threads on the same pid hanging. It is in the server, or maybe between the server
and mod_jk, but it is happening in the server tier. At first we suspected connection pool/connection re-use in mod_jk so we eventually disabled both. Connections are now created and dismantled for every request. We can see the packets coming in at time t and
then the request from apache being forwarded to the app server tier at t+n*30. We see no issues with sockets establishing connections, indeed we see the hang before the initial socket SYN is sent. So we now believe that it is not a network issue
This is not a busy server, and so unlikely related to load. vmstat and sar both show nothing going on on the system when this hang occurs. Garbage collection is also not an issue.
With apache logging on level trace8 for the tid with the problem, we see (*** marks removed text) the clear delay between request/response.
[Wed Sep 25 10:51:00.188705 2013] [http:trace4] [pid 28203:tid 140532468623104] http_request.c(305): [client ***:43707] Connection: keep-alive, referer: ****
[Wed Sep 25 10:51:00.188712 2013] [http:trace4] [pid 28203:tid 140532468623104] http_request.c(305): [client ***:43707] If-Modified-Since: Wed, 11 Sep 2013 18:34:32 GMT, ****
[Wed Sep 25 10:51:00.188720 2013] [http:trace4] [pid 28203:tid 140532468623104] http_request.c(305): [client ***:43707] If-None-Match: W/\\"2610-1378924472000\\", referer: ****
[Wed Sep 25 10:51:00.188727 2013] [http:trace4] [pid 28203:tid 140532468623104] http_request.c(305): [client ***:43707] Cache-Control: max-age=0, referer: ***
[Wed Sep 25 10:51:00.188734 2013] [http:trace4] [pid 28203:tid 140532468623104] http_request.c(305): [client ***:43707] X-Forwarded-For: ****
[Wed Sep 25 10:51:00.188767 2013] [rewrite:trace2] [pid 28203:tid 140532468623104] mod_rewrite.c(468): [client ***:43707] 206.108.42.133 - -**** init rewrite engine with requested uri ***
[Wed Sep 25 10:51:00.188784 2013] [rewrite:trace1] [pid 28203:tid 140532468623104] mod_rewrite.c(468): [client ***:43707] 206.108.42.133 - - [****][rid#7fd030072280/initial] pass through ***, referer: ***
[Wed Sep 25 10:51:00.188812 2013] [authz_core:debug] [pid 28203:tid 140532468623104] mod_authz_core.c(828): [client ***:43707] AH01628: authorization result: granted (no directives), referer: ****
[Wed Sep 25 10:51:00.188821 2013] [core:trace3] [pid 28203:tid 140532468623104] request.c(227): [client ***:43707] request authorized without authentication by access_checker_ex hook: ***
[Wed Sep 25 10:51:00.188841 2013] [charset_lite:trace3] [pid 28203:tid 140532468623104] mod_charset_lite.c(211): [client***:43707] uri: *** file: *** method: 0 imt: text/css flags: (null)->(null), referer: ***
[Wed Sep 25 10:51:00.188852 2013] [charset_lite:debug] [pid 28203:tid 140532468623104] mod_charset_lite.c(219): [client ***:43707] AH01448: incomplete configuration: src unspecified, dst unspecified, referer: ***
[Wed Sep 25 10:52:35.693776 2013] [core:trace6] [pid 28172:tid 140532468623104] core_filters.c(525): [client ***:10778] core_output_filter: flushing because of FLUSH bucket
[Wed Sep 25 10:53:30.202645 2013] [deflate:trace1] [pid 28203:tid 140532468623104] mod_deflate.c(490): [client ***:43707] Not compressing very small response of 0 bytes, referer:
***
[Wed Sep 25 10:53:30.202686 2013] [http:trace3] [pid 28203:tid 140532468623104] http_filters.c(963): [client ***:43707] Response sent with status 304, headers:, referer: ***
mod_jk logging at debug: We see evidence of the hang there but no smoking gun. Again just a delay of some multiple of 30 second.
Obviously the multiples of 30 seconds should be a clue (feels like a timeout?) but have not yet been able to tie it out with anything.
We are running on VMWare – apache server 2.4.4 (yes, I know we are going to upgrade to the latest….)
My question is whether anyone else has seen this issue before?
|