On Wed, Nov 1, 2017 at 9:09 PM, Suvendu Sekhar Mondal <suv3ndu@xxxxxxxxx> wrote: > Hello Everyone, > > I am seeing one interesting behavior of Apache httpd. > > We have multiple Apache httpds in front of set of Tomcat JVMs. I found > that sometimes *one of the httpds marking one of the JVMs down* for > 180 Sec("retry" value). As a result, users logged on that JVM are > getting 5xx error. First, I suspected that long GCs are causing it but > it was not the case. We have 5 Sec of "ping" timeout and GCs during > problem period was 500ms-700ms. Also there were plenty of threads > available in the JVM to cater new requests. After some more drill-down > it was found that each of those "mark down" incidents are correlated > with some really long processing(800 Sec) on JVM which surpasses our > "ProxyTimeout" and "ttl" limits. Yes, some of the workflows of our app > can take that much time if they are processing large volume - we are > working on it. > > My understanding is, these are not "ping" failure case where httpd > marks the JVM down. Being said that, can it happen that either > "ProxyTimeout" or "ttl" failure instructing httpd to mark the JVM > down? Or, do you think it is something else? Please let me know. > > httpd version: 2.4.10 > > httpd setting: > ProxyTimeout 300 > > <Proxy balancer://mycluster> > ProxySet lbmethod=byrequests > ProxySet stickysession=JSESSIONID|jsessionid > ProxySet scolonpathdelim=On > ProxySet growth=2 > ProxySet nofailover=On > > BalancerMember http://abc route=abc keepalive=on ttl=300 ping=5 retry=180 > > </proxy> > > Excerpts from httpd Error log: > [Wed Nov 01 08:17:39.221276 2017] [proxy_http:error] [pid 31848:tid > 9828] (OS 10060)A connection attempt failed because the connected > party did not properly respond after a period of time, or established > connection failed because connected host has failed to respond. : > [client 10.254.52.48:13964] AH01102: error reading status line from > remote server abc, referer: xxx > [Wed Nov 01 08:17:39.221276 2017] [proxy:error] [pid 31848:tid 9828] > [client 10.254.52.48:13964] AH00898: Timeout on 100-Continue returned > by /xxx > [Wed Nov 01 08:17:39.221276 2017] [proxy_balancer:error] [pid > 31848:tid 9828] [client 10.254.52.48:13964] AH01167: > balancer://mycluster: All workers are in error state for route (abc), > referer: xxx > [Wed Nov 01 08:17:39.346281 2017] [proxy_balancer:error] [pid > 31848:tid 9760] [client 10.254.52.48:17783] AH01167: > balancer://mycluster: All workers are in error state for route (abc) Hello Everyone, After some investigation I found that Apache is “marking down” a JVM once ProxyTimeout elapsed. This is what happens: 1. A process got kicked off on a JVM. Let’s assume it is going to take lots of time(10 min) to complete. 2. While this processing is halfway, ProxyTimeout(5 min) elapsed. 3. Then Apache completely ignores default failontimeout=off setting and marks the JVM down for next 180 Sec(retry value). 4. Problem started! This behavior sounds like a bug(?) to me because: - If you forcefully failed a HTTP GET request by elapsing ProxyTimeout, Apache *do not* mark the JVM down. It only fails that long running request with 502 error. That is expected. - If you do the same thing for a HTTP POST request, Apache *mark the JVM down*. This is *NOT* a desired behavior. I can reproduce the issue with Apache/2.4.25 also. Can I open a bug for this behavior? Or, Is it already resolved? Please let me know. Thanks! Suvendu --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx