So we've got some ongoing issues in our environment right now. Some of them are unrelated. All are being worked on but one in particular I want to discuss on the list here because I just don't know what changed. The problem: When one of the fas servers goes offline, most of our other apps get taken offline. The way it is supposed to work: When one of our fas servers goes offline, haproxy takes it out of the farm and sends all requests to the still online fas server. Thus, possibly, generating a few errors for a short time but generally goes un-noticed. The details: When one of the fas servers goes offline, haproxy is hanging on that connection. The application servers hang or possibly try to re-use connections to fas, thus causing the number of httpd processes to sky rocket on the app servers. Ultimately hitting MaxClients and taking everything offline. This happens fairly quickly, matter of seconds. It seems that even after haproxy flags the fas server as dead (takes about 15s), that any connections open at that time to the old server aren't killed. They just hang. Outstanding questions: What changed? Does python-fedora (our primary interface to fas) now do something differently with keepalive? Anyone else using haproxy seeing this same issue? I've got redispatch enabled still. -Mike _______________________________________________ Fedora-infrastructure-list mailing list Fedora-infrastructure-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list