The mirrorlists are falling over - haproxy keeps marking app servers as down, and some requests are getting HTTP 503 Server Temporarily Unavailable responses. This happens every 10 minutes, for 2-3 minutes, as several thousand EC3 instances request the mirrorlist again. For reference, we're seeing a spike of over 2000 simultaneous requests across our 6 proxy and 4 app servers, occuring every 10 minutes, dropping back down to under 20 simultaneous requests inbetween. Trying out several things. 1) increase number of mirrorlist WSGI processes on each app server from 45 to 100. This is the maximum number of simultaneous mirrorlist requests that each server can serve. I've tried this value on app01, and running this many still keeps the mirrorlist_server back end (which fork()s on each connection) humming right along. I think this is safe. Increasing much beyond this though, the app servers will start to swap, which we must avoid. We can watch the swapping, and if it starts, lower this value somewhat. The value was 6 just a few days ago, which wasn't working either. This gives us 400 slots to work with on the app servers. 2) try limiting the number of connections from each proxy server to each app server, to 25 per. Right now we're seeing a max of between 60 and 135 simultaneous requests from each proxy server to each app server. All those over 25 will get queued by haproxy and then served as app server instances become available. I did this on proxy03, and it really helped out the app servers and kept them humming. There were still some longish response times (some >30 seconds). We're still oversubscribing app server slots here though, but oddly, not by as much as you'd think, as proxy03 is taking 40% of the incoming requests itself for some reason. 3) bump the haproxy timeout up to 60 seconds. 5 seconds (the global default) is way too low when we get the spikes. This was causing haproxy to think app servers were down, and start sending load to the other app servers, which would then overload, and then start sending to the first backup server, ... Let's be nicer. If during a spike it takes 60 seconds to get an answer, or be told HTTP 503, so be it. 4) have haproxy use all the backup servers when all the app servers are marked down. Right now it sends all the requests to a single backup server, and if that's down, all to the next backup server, etc. We know one server can't handle the load (even 4 aren't really), so don't overload a single backup either. 5) the default mirrorlist_server listen backlog is only 5, meaning that at most 5 WSGI clients get queued up if all the children are busy. To handle spikes, bump that to 300 (though it's limited by the kernel to 128 by default). This was the intent, but the code was buggy. 6) bug fix to mirrorlist_server to not ignore SIGCHLD. Amazing this ever worked in the first place. This should resolve the problem where mirrorlist_server slows down and memory grows over time. diff --git a/modules/haproxy/files/haproxy.cfg b/modules/haproxy/files/haproxy.cfg index 6e538ed..5a6fda0 100644 --- a/modules/haproxy/files/haproxy.cfg +++ b/modules/haproxy/files/haproxy.cfg @@ -43,15 +43,17 @@ listen fp-wiki 0.0.0.0:10001 listen mirror-lists 0.0.0.0:10002 balance hdr(appserver) - server app1 app1:80 check inter 5s rise 2 fall 3 - server app2 app2:80 check inter 5s rise 2 fall 3 - server app3 app3:80 check inter 5s rise 2 fall 3 - server app4 app4:80 check inter 5s rise 2 fall 3 - server app5 app5:80 backup check inter 10s rise 2 fall 3 - server app6 app6:80 backup check inter 10s rise 2 fall 3 - server app7 app7:80 check inter 5s rise 2 fall 3 - server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 + timeout connect 60s + server app1 app1:80 check inter 5s rise 2 fall 3 maxconn 25 + server app2 app2:80 check inter 5s rise 2 fall 3 maxconn 25 + server app3 app3:80 check inter 5s rise 2 fall 3 maxconn 25 + server app4 app4:80 check inter 5s rise 2 fall 3 maxconn 25 + server app5 app5:80 backup check inter 10s rise 2 fall 3 maxconn 25 + server app6 app6:80 backup check inter 10s rise 2 fall 3 maxconn 25 + server app7 app7:80 check inter 5s rise 2 fall 3 maxconn 25 + server bapp1 bapp1:80 backup check inter 5s rise 2 fall 3 maxconn 25 option httpchk GET /mirrorlist + option allbackups listen pkgdb 0.0.0.0:10003 balance hdr(appserver) diff --git a/modules/mirrormanager/files/mirrorlist-server.conf b/modules/mirrormanager/files/mirrorlist-server.conf index fd7cf98..482f7af 100644 --- a/modules/mirrormanager/files/mirrorlist-server.conf +++ b/modules/mirrormanager/files/mirrorlist-server.conf @@ -7,7 +7,7 @@ Alias /publiclist /var/lib/mirrormanager/mirrorlists/publiclist/ ExpiresDefault "modification plus 1 hour" </Directory> -WSGIDaemonProcess mirrorlist user=apache processes=45 threads=1 display-name=mirrorlist maximum-requests=1000 +WSGIDaemonProcess mirrorlist user=apache processes=100 threads=1 display-name=mirrorlist maximum-requests=1000 WSGIScriptAlias /metalink /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi WSGIScriptAlias /mirrorlist /usr/share/mirrormanager/mirrorlist-server/mirrorlist_client.wsgi >From 45d401446bfecba768fdf4f26409bf291172f7bc Mon Sep 17 00:00:00 2001 From: Matt Domsch <Matt_Domsch@xxxxxxxx> Date: Mon, 10 May 2010 15:23:57 -0500 Subject: [PATCH 1/2] mirrorlist_server: set request_queue_size earlier While the docs say that request_queue_size can be a per-instance value, in reality it's used during ForkingUnixStreamServer __init__, meaning it needs to override the default class attribute instead. Moving this up means that connections aren't blocking after about 5 are already running (default), and mirrorlist_client can now connect in ~200us like one would expect, rather than seconds or tens of seconds like we were seeing when lots (say, 40+) clients were connecting simultaneously. --- mirrorlist-server/mirrorlist_server.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py index 8825a1a..2ade357 100755 --- a/mirrorlist-server/mirrorlist_server.py +++ b/mirrorlist-server/mirrorlist_server.py @@ -725,6 +725,7 @@ def sighup_handler(signum, frame): signal.signal(signal.SIGHUP, sighup_handler) class ForkingUnixStreamServer(ForkingMixIn, UnixStreamServer): + request_queue_size = 300 def finish_request(self, request, client_address): signal.signal(signal.SIGHUP, signal.SIG_IGN) BaseServer.finish_request(self, request, client_address) @@ -815,7 +816,6 @@ def main(): signal.signal(signal.SIGHUP, sighup_handler) signal.signal(signal.SIGCHLD, signal.SIG_IGN) ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler) - ss.request_queue_size = 300 ss.serve_forever() try: -- 1.7.0.1 >From d82f20b10c755e5ce40d67ca7ea4a6dba9e37d34 Mon Sep 17 00:00:00 2001 From: Matt Domsch <Matt_Domsch@xxxxxxxx> Date: Mon, 10 May 2010 23:56:09 -0500 Subject: [PATCH 2/2] mirrorlist_server: don't ignore SIGCHLD Amazing that this ever worked in the first place. Ignoring SIGCHLD causes the parent's active_children list to grow without bound. This is also probably the cause of our long-term memory size growth. The parent really needs to catch SIGCHLD in order to do its reaping. --- mirrorlist-server/mirrorlist_server.py | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/mirrorlist-server/mirrorlist_server.py b/mirrorlist-server/mirrorlist_server.py index 2ade357..0de7132 100755 --- a/mirrorlist-server/mirrorlist_server.py +++ b/mirrorlist-server/mirrorlist_server.py @@ -814,7 +814,6 @@ def main(): open_geoip_databases() read_caches() signal.signal(signal.SIGHUP, sighup_handler) - signal.signal(signal.SIGCHLD, signal.SIG_IGN) ss = ForkingUnixStreamServer(socketfile, MirrorlistHandler) ss.serve_forever() -- 1.7.0.1 -- Matt Domsch Technology Strategist Dell | Office of the CTO _______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure