On Sat, Jul 30, 2016 at 8:36 AM, dE <de.techno@xxxxxxxxx> wrote: > > Yes, as you said I did try that before -- Your previous configuration had timeout=10 on the ProxyPass line, not the BalancerMember one as expected. > > BalancerMember balancer://localbalance/ http://[fc00::1:4]/ timeout=10 > ProxyPass / balancer://localbalance/ failontimeout=on > > I even tried the retry (set to 600) and forcerecovery parameter. Really, without the timeout=10 on the ProxyPass (like the configuration you mentioned in your previous message)? > > Have you ever tried this in the tests that I'm doing (SIGSTOP apache)? Actually yes, with that exact configuration: <VirtualHost *:8080> ServerName localhost:8080 ProxyPass / balancer://localbalance/ failontimeout=on failonstatus=502 forcerecovery=off BalancerMember balancer://localbalance/ http://127.0.0.1:80/ timeout=10 retry=30 </VirtualHost> 1. The backend is initially not started: $ time nc localhost 8080 <<\EOF GET / HTTP/1.1 Host: localhost:8080 EOF HTTP/1.1 503 Service Unavailable Date: Sat, 30 Jul 2016 10:33:29 GMT Server: Apache/2.4.24-dev (Unix) OpenSSL/1.0.2g [...] real 0m0.005s user 0m0.000s sys 0m0.000s $ tail -3 error_log [Sat Jul 30 12:33:29.725314 2016] [proxy:error] [pid 14463:tid 139636299003648] (111)Connection refused: AH00957: HTTP: attempt to connect to 127.0.0.1:80 (127.0.0.1) failed [Sat Jul 30 12:33:29.725392 2016] [proxy:error] [pid 14463:tid 139636299003648] AH00959: ap_proxy_connect_backend disabling worker for (127.0.0.1) for 30s [Sat Jul 30 12:33:29.725415 2016] [proxy_http:error] [pid 14463:tid 139636299003648] [client ::1:50006] AH01114: HTTP: failed to make connection to backend: 127.0.0.1 2. The backend is still not started, ~6 seconds later (i.e. < 30s): GET / HTTP/1.1 Host: localhost:8080 EOF HTTP/1.1 503 Service Unavailable Date: Sat, 30 Jul 2016 10:33:35 GMT Server: Apache/2.4.24-dev (Unix) OpenSSL/1.0.2g [..] real 0m0.004s user 0m0.000s sys 0m0.000s $ tail -1 error_log [Sat Jul 30 12:33:35.574164 2016] [proxy_balancer:error] [pid 14463:tid 139636381173504] [client ::1:50010] AH01170: balancer://localbalance: All workers are in error state 3. The backend is now started, ~15s later (i.e. still < 30s): $ time nc localhost 8080 <<\EOF GET / HTTP/1.1 Host: localhost:8080 EOF HTTP/1.1 503 Service Unavailable Date: Sat, 30 Jul 2016 10:33:44 GMT Server: Apache/2.4.24-dev (Unix) OpenSSL/1.0.2g [...] real 0m0.004s user 0m0.000s sys 0m0.000s $ tail -1 error_log [Sat Jul 30 12:33:44.327913 2016] [proxy_balancer:error] [pid 14463:tid 139636282218240] [client ::1:50012] AH01170: balancer://localbalance: All workers are in error state 4. The backend is still running normally, ~1mn later (i.e. > 30s): $ time nc localhost 8080 <<\EOF GET / HTTP/1.1 Host: localhost:8080 EOF HTTP/1.1 200 OK Date: Sat, 30 Jul 2016 10:34:40 GMT Server: Apache [...] real 0m0.006s user 0m0.004s sys 0m0.000s 5. The backend is SIGSTOP-ed (no VM used, so "ServerLimit 1" and kill -STOP the only child): $ time nc localhost 8080 <<\EOF GET / HTTP/1.1 Host: localhost:8080 EOF HTTP/1.1 502 Proxy Error Date: Sat, 30 Jul 2016 10:34:51 GMT Server: Apache/2.4.24-dev (Unix) OpenSSL/1.0.2g [...] real 0m10.015s user 0m0.000s sys 0m0.004s $ tail -4 error_log [Sat Jul 30 12:35:01.355768 2016] [proxy_http:error] [pid 14463:tid 139636273825536] (70007)The timeout specified has expired: [client ::1:50018] AH01102: error reading status line from remote server 127.0.0.1:80 [Sat Jul 30 12:35:01.355849 2016] [proxy:error] [pid 14463:tid 139636273825536] [client ::1:50018] AH00898: Error reading from remote server returned by / [Sat Jul 30 12:35:01.355942 2016] [proxy_balancer:error] [pid 14463:tid 139636273825536] [client ::1:50018] AH01174: balancer://localbalance: Forcing worker (http://127.0.0.1/) into error state due to status code 502 matching 'failonstatus' balancer parameter [Sat Jul 30 12:35:01.355964 2016] [proxy_balancer:error] [pid 14463:tid 139636273825536] [client ::1:50018] AH02460: balancer://localbalance: Forcing worker (http://127.0.0.1/) into error state due to timeout and 'failontimeout' parameter being set 6. The backend is still SIGSTOP-ed, ~25s later (i.e. < 30s): $ time nc localhost 8080 <<\EOF GET / HTTP/1.1 Host: localhost:8080 EOF HTTP/1.1 503 Service Unavailable Date: Sat, 30 Jul 2016 10:35:26 GMT Server: Apache/2.4.24-dev (Unix) OpenSSL/1.0.2g [...] real 0m0.004s user 0m0.000s sys 0m0.000s $ tail -1 error_log [Sat Jul 30 12:35:26.797383 2016] [proxy_balancer:error] [pid 14463:tid 139636265432832] [client ::1:50022] AH01170: balancer://localbalance: All workers are in error state So everything works as expected for me, timeouts are always 10s max, and the worker never wakes up before the retry= period... Regards, Yann. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx