Disappearing requests / tuning event MPM, 2.2.22

Tom Evans <tevans.uk@xxxxxxxxxxxxxx> · Tue, 28 Feb 2012 14:50:29 +0000

Hi all

[I'm re-reading this, and it is a bit of a convoluted setup - I
appreciate any eyes that read this!]

Hardware: 2 x Dell 2850, 2 x Xeon 5140 2.33 GHz, 4 GB RAM
OS: FreeBSD 7.1-RELEASE-p4
Server version: Apache/2.2.22 (FreeBSD)
Server built:   Feb 13 2012 22:29:44

At $JOB we use Apache to serve as a reverse proxy - we have a pair of
servers which all our web requests are round robin routed to. These
servers then provide SSL termination, serve static content and reverse
proxy onto backend servers for dynamic content.

We in fact run two httpd server instances on each server; one using
worker MPM providing the SSL termination, and one using event MPM to
serve static content and reverse proxy content.

First off, I'm not a network guy; I can find out more about this
routing stuff if you think it's relevant though. IIRC it works like
this: both boxes have the all public IP addresses for our websites
allocated on the loopback interface, and the edge routers round robin
requests to a pair of CARP/VRRP IP addresses on the Apache boxes. By
controlling which box has which CARP address, we can control which
box(s) are receiving traffic.

So our problems started when we put all traffic through one box,
whilst we upgraded to 2.2.22. Some of our websites are served through
a CDN, and we could observe from our office a significant proportion
of requests that went via the CDN failed to ever reach our server. We
can see from our squid proxy log that requests were made that did not
reach or get recorded by Apache.

We also had reports from our clients and users that the websites (even
non CDN sites) were subjectively 'slow' once we were operating on just
one box. We think that these were requests failing to reach our
server, and then subsequently being retried.

We can quite clearly identify when this happens with sites served from
the CDN, as each timeout results in them returning a 503 to us, which
we can detect in our squid proxy logs and track how frequently this
was happening. When all the traffic was put through one of the Apache
frontend proxies, the error rate we could detect was 5 times higher
than when we spread the load through both frontend proxies.

So, this lead us to look at the listen queue:

> $ netstat -s 2>/dev/null| grep listen
	27004 listen queue overflows

We immediately thought we had found the issue, and increased the queue
length from 128 (FreeBSD default) to 511. No more listen queue
overflows (yay), still getting requests not making it to Apache (boo).

So, finally, we looked at traffic load. Averaged over an entire
weekday, we have a moderate/low load - a mean of 40 requests/second,
with a peak of ~500-600 requests/second. We've configured the event
MPM with these settings:

<IfModule mpm_event_module>
    StartServers          4
    MaxClients          480
    MinSpareThreads      30
    MaxSpareThreads      80
    ThreadsPerChild      40
    MaxRequestsPerChild   0
</IfModule>

Clearly, if we are to handle a peak of ~1000 requests/second (leave
some headroom), then this configuration needs a tweak. Where we are
confused is how these numbers relate to the number of processes,
threads and workers.

Am I right in thinking that MaxClients, {Min,Max}SpareThreads are
global configuration? IE, if there are only 39 spare threads across
all processes, then Apache will create a new child process, with
ThreadsPerChild threads?

If that is right, does this configuration seem sane?

StartServers 8
MaxClients 1024
MinSpareThreads 128
MaxSpareThreads 512
ThreadsPerChild 64
MaxRequestsPerChild 0

By my calculations, this would give us initially 8 children, each with
64 threads, for a total of 512 workers. If there are ever less than
128 spare workers, then Apache will spawn additional processes, up to
a maximum of 16 (ServerLimit), which will give us a maximum total of
1024 workers. Is it wise having 64 threads in a single child, or
should I spread it around a bit more? There is at least ~2 GB of free
RAM.

Finally (seriously, anyone still reading this, thank you!), can anyone
explain what mod_status displays? Its scoreboard consists of 16 rows
of entries, each row being 64 characters long. I assume each row
corresponds to a child or potential child, and each entry on that row
to a worker thread on that child. So why 64 entries? Our config says
40 threads per child, not 64…

Any help to any of these problems is greatly appreciated!

Cheers

Tom

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx