Hi all [I'm re-reading this, and it is a bit of a convoluted setup - I appreciate any eyes that read this!] Hardware: 2 x Dell 2850, 2 x Xeon 5140 2.33 GHz, 4 GB RAM OS: FreeBSD 7.1-RELEASE-p4 Server version: Apache/2.2.22 (FreeBSD) Server built: Feb 13 2012 22:29:44 At $JOB we use Apache to serve as a reverse proxy - we have a pair of servers which all our web requests are round robin routed to. These servers then provide SSL termination, serve static content and reverse proxy onto backend servers for dynamic content. We in fact run two httpd server instances on each server; one using worker MPM providing the SSL termination, and one using event MPM to serve static content and reverse proxy content. First off, I'm not a network guy; I can find out more about this routing stuff if you think it's relevant though. IIRC it works like this: both boxes have the all public IP addresses for our websites allocated on the loopback interface, and the edge routers round robin requests to a pair of CARP/VRRP IP addresses on the Apache boxes. By controlling which box has which CARP address, we can control which box(s) are receiving traffic. So our problems started when we put all traffic through one box, whilst we upgraded to 2.2.22. Some of our websites are served through a CDN, and we could observe from our office a significant proportion of requests that went via the CDN failed to ever reach our server. We can see from our squid proxy log that requests were made that did not reach or get recorded by Apache. We also had reports from our clients and users that the websites (even non CDN sites) were subjectively 'slow' once we were operating on just one box. We think that these were requests failing to reach our server, and then subsequently being retried. We can quite clearly identify when this happens with sites served from the CDN, as each timeout results in them returning a 503 to us, which we can detect in our squid proxy logs and track how frequently this was happening. When all the traffic was put through one of the Apache frontend proxies, the error rate we could detect was 5 times higher than when we spread the load through both frontend proxies. So, this lead us to look at the listen queue: > $ netstat -s 2>/dev/null| grep listen 27004 listen queue overflows We immediately thought we had found the issue, and increased the queue length from 128 (FreeBSD default) to 511. No more listen queue overflows (yay), still getting requests not making it to Apache (boo). So, finally, we looked at traffic load. Averaged over an entire weekday, we have a moderate/low load - a mean of 40 requests/second, with a peak of ~500-600 requests/second. We've configured the event MPM with these settings: <IfModule mpm_event_module> StartServers 4 MaxClients 480 MinSpareThreads 30 MaxSpareThreads 80 ThreadsPerChild 40 MaxRequestsPerChild 0 </IfModule> Clearly, if we are to handle a peak of ~1000 requests/second (leave some headroom), then this configuration needs a tweak. Where we are confused is how these numbers relate to the number of processes, threads and workers. Am I right in thinking that MaxClients, {Min,Max}SpareThreads are global configuration? IE, if there are only 39 spare threads across all processes, then Apache will create a new child process, with ThreadsPerChild threads? If that is right, does this configuration seem sane? StartServers 8 MaxClients 1024 MinSpareThreads 128 MaxSpareThreads 512 ThreadsPerChild 64 MaxRequestsPerChild 0 By my calculations, this would give us initially 8 children, each with 64 threads, for a total of 512 workers. If there are ever less than 128 spare workers, then Apache will spawn additional processes, up to a maximum of 16 (ServerLimit), which will give us a maximum total of 1024 workers. Is it wise having 64 threads in a single child, or should I spread it around a bit more? There is at least ~2 GB of free RAM. Finally (seriously, anyone still reading this, thank you!), can anyone explain what mod_status displays? Its scoreboard consists of 16 rows of entries, each row being 64 characters long. I assume each row corresponds to a child or potential child, and each entry on that row to a worker thread on that child. So why 64 entries? Our config says 40 threads per child, not 64… Any help to any of these problems is greatly appreciated! Cheers Tom --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See <URL:http://httpd.apache.org/userslist.html> for more info. To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx " from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx