Re: Apache "marking down" a back-end server

Suvendu Sekhar Mondal <suv3ndu@xxxxxxxxx> · Thu, 2 Nov 2017 20:11:19 +0530

On Wed, Nov 1, 2017 at 9:09 PM, Suvendu Sekhar Mondal <suv3ndu@xxxxxxxxx> wrote:
> Hello Everyone,
>
> I am seeing one interesting behavior of Apache httpd.
>
> We have multiple Apache httpds in front of set of Tomcat JVMs. I found
> that sometimes *one of the httpds marking one of the JVMs down* for
> 180 Sec("retry" value). As a result, users logged on that JVM are
> getting 5xx error. First, I suspected that long GCs are causing it but
> it was not the case. We have 5 Sec of "ping" timeout and GCs during
> problem period was 500ms-700ms. Also there were plenty of threads
> available in the JVM to cater new requests. After some more drill-down
> it was found that each of those "mark down" incidents are correlated
> with some really long processing(800 Sec) on JVM which surpasses our
> "ProxyTimeout" and "ttl" limits. Yes, some of the workflows of our app
> can take that much time if they are processing large volume - we are
> working on it.
>
> My understanding is, these are not "ping" failure case where httpd
> marks the JVM down. Being said that, can it happen that either
> "ProxyTimeout" or "ttl" failure instructing httpd to mark the JVM
> down? Or, do you think it is something else? Please let me know.
>
> httpd version: 2.4.10
>
> httpd setting:
> ProxyTimeout 300
>
> <Proxy balancer://mycluster>
> ProxySet lbmethod=byrequests
> ProxySet stickysession=JSESSIONID|jsessionid
> ProxySet scolonpathdelim=On
> ProxySet growth=2
> ProxySet nofailover=On
>
> BalancerMember http://abc route=abc keepalive=on ttl=300 ping=5 retry=180
>
> </proxy>
>
> Excerpts from httpd Error log:
> [Wed Nov 01 08:17:39.221276 2017] [proxy_http:error] [pid 31848:tid
> 9828] (OS 10060)A connection attempt failed because the connected
> party did not properly respond after a period of time, or established
> connection failed because connected host has failed to respond.  :
> [client 10.254.52.48:13964] AH01102: error reading status line from
> remote server abc, referer: xxx
> [Wed Nov 01 08:17:39.221276 2017] [proxy:error] [pid 31848:tid 9828]
> [client 10.254.52.48:13964] AH00898: Timeout on 100-Continue returned
> by /xxx
> [Wed Nov 01 08:17:39.221276 2017] [proxy_balancer:error] [pid
> 31848:tid 9828] [client 10.254.52.48:13964] AH01167:
> balancer://mycluster: All workers are in error state for route (abc),
> referer: xxx
> [Wed Nov 01 08:17:39.346281 2017] [proxy_balancer:error] [pid
> 31848:tid 9760] [client 10.254.52.48:17783] AH01167:
> balancer://mycluster: All workers are in error state for route (abc)

Hello Everyone,

After some investigation I found that Apache is “marking down” a JVM
once ProxyTimeout elapsed. This is what happens:
 1. A process got kicked off on a JVM. Let’s assume it is going to
take lots of time(10 min) to complete.
 2. While this processing is halfway, ProxyTimeout(5 min) elapsed.
 3. Then Apache completely ignores default failontimeout=off setting
and marks the JVM down for next 180 Sec(retry value).
 4. Problem started!

This behavior sounds like a bug(?) to me because:
 - If you forcefully failed a HTTP GET request by elapsing
ProxyTimeout, Apache *do not* mark the JVM down. It only fails that
long running request with 502 error. That is expected.
 - If you do the same thing for a HTTP POST request, Apache *mark the
JVM down*. This is *NOT* a desired behavior.

I can reproduce the issue with Apache/2.4.25 also. Can I open a bug
for this behavior? Or, Is it already resolved? Please let me know.

Thanks!
Suvendu

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx