"Quadruple" memory usage with squid

Linda Messerschmidt <linda.messerschmidt@xxxxxxxxx> · Sun, 22 Nov 2009 07:50:11 -0500

We run four squid 2.7-STABLE7 reverse proxies.

Two have 8GB of memory.  cache_mem is set to 2GB.
One has 16GB of memory.  cache_mem is set to 4GB.
One has 24GB of memory.  cache_mem is set to 6GB.

All machines are 64-bit FreeBSD 7.2. On all machines, we have:

cache_dir      null /null
url_rewrite_program		/usr/local/sbin/squid_redir
url_rewrite_children    16

(squid_redir is a C program that uses about 16MB of address space and
3MB of resident space per process.)

In all cases, the "steady state" memory usage of squid at most times
is about double each machine's cache_mem setting.

I understand that squid has overhead and that cache_mem controls only
a part of its memory usage, but it seems odd that the overhead is
always exactly 100% of cache_mem.  Squid itself doesn't report all the
memory as in use, as shown by this example from the 16GB (cache_mem
4GB) machine:

Resource usage for squid:
	UP Time:	149903.139 seconds
	CPU Time:	5885.643 seconds
	CPU Usage:	3.93%
	CPU Usage, 5 minute avg:	2.71%
	CPU Usage, 60 minute avg:	3.02%
	Process Data Segment Size via sbrk(): 92 KB
	Maximum Resident Size: 8804588 KB
	Page faults with physical i/o: 1
Memory accounted for:
	Total accounted:       4405418 KB
	memPoolAlloc calls: 1249661983
	memPoolFree calls: 1245286664

On all these systems, no matter what cache_mem settings are used,
"Maximum Resident Size" is always just about exactly double "Total
accounted."  So I wonder what all the other RAM is doing.  There's no
index of what's on disk since there's no disk; and according to the
memory-sizing FAQ there should only be "about 10-20MB" above and
beyond cache_mem.  Even assuming that was written for much smaller
sizes, it could be off by two orders of magnitude and still not
reflect what we are seeing.

2 x cache_mem just doesn't feel right.  If it is not right, I would
like to resolve it or at least understand where the rest of the memory
is going.  If this memory usage is normal, we will cope with it.

The real problem comes any time those url_rewrite programs get
respawned.  For each of the rewriters, squid does a fork()/exec() pair
in ipc.c.  The fork() step creates a second copy of the process and
copies its entire address space.  One would expect copy-on-write
behavior, but this appears not to be the case.  Not only does it take
a nontrivial amount of time to copy that RAM on larger systems, but If
the system does not have enough RAM to hold both copies when this
happens, horrible paging and 100% CPU usage from squid (both parent
and each spawned child) ensues for about 10-20 minutes while it sorts
out, during which time squid is unresponsive.

I also note that after such an indident, when the steady state
returns, the process remains 2 x cache_mem in size, but half of it has
been paged out and there will be *no* paging activity on the system
until the next such incident.  That makes it feel like a fixed-sized
memory leak.

What all this means is that we can use only 25% of system memory for
actual caching, even if the system has nothing else to do, because
url_rewrite programs get restarted on every -k reconfig or (more
importantly) -k rotate.

This seems so inefficient to me that we just must be doing something
wrong.  Is there any way we can make better use of RAM in these
machines?

Thanks for any suggestions!