Re: Increased memory usage (memory leak?)

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Thu, 10 Jun 2010 00:33:18 +0000

On Wed, 09 Jun 2010 22:29:50 +0300, Panagiotis Christias
<p.christias@xxxxxxxxxxx> wrote:
> Hello list (sorry for the long message),
> 
> we are using eight servers (running FreeBSD 7.3 amd64 with 2G ram each) 
> as transparent proxy servers (Squid 2.7.9 along with squidGuard 1.4) for

> the content filtering service of the Greek School Network.
> 
> Our current squid configuration includes:
>    http_port           80 transparent
>    http_port           8080
>    cache_mem           192 MB
>    cache_swap_low      80
>    cache_swap_high     85
>    maximum_object_size 4096 KB
>    ipcache_size        94208
>    ipcache_low         90
>    ipcache_high        95
>    cache_store_log     none
>    cache_dir           null /tmp
>    redirect_program    /usr/local/bin/squidGuard
>    redirect_children   48
>    redirector_bypass   on
>    uri_whitespace      encode
>    request_header_max_size 25 KB
> 
> Although it is not peak season (schools have exams and are about to 
> close for summer), we are experiencing high memory usage in some cases 
> (see cache01 at http://noc.ntua.gr/~christia/squid-vsize.png). All boxes

> are identical. The memory usage seems to be related to the type of the 
> requests (not the volume, http://noc.ntua.gr/~christia/squid-reqs.png). 
> In order to rule out any possibility of server specific problems we 
> moved the client requests from one cache box to another and the memory 
> allocation problem moved along (at first it was cache04 and now it is 
> cache01).
> 
> We compared the cache.log files and found no special messages in cache01

> or in any other server that showed increased memory usage. After looking

> into the access.log we tend to believe that the memory leaps occur 
> during periods that squid receives too many concurrent requests for 
> sites that are responding too slowly or not responding at all 
> (firewalled, not sending any packets back). In these cases, concurrent 
> requests vary from 40 reqs/sec to 700 reqs/sec, the average service 
> varies from 570 msec to 178502 msec and the HTTP return codes are 
> usually TCP_MISS/417, TCP_MISS/504 and TCP_MISS/000.
> 
> Cache manager reports for cache01 692882 allocated cbdata 
> clientHttpRequest objects occupying 742381 KB and 100% of them in use 
> (ten times more than the other cache boxes).
> 
> Full report is available here:
>    http://noc.ntua.gr/~christia/cache01.mem.20100609-21:35.txt
>    http://noc.ntua.gr/~christia/cache01.info.20100609-21:35.txt
> 
> At the same 'top' in cache01 reports about the squid process:
>    1575 MB virtual size
>     303 MB resident size
> 
> Finally, pmap (FreeBSD sysutils/pmap port) report is available here (in 
> case it is useful):
>    http://noc.ntua.gr/~christia/cache01.pmap.20100609-21:35.txt
> 
> Anyway, is this behaviour normal or is it a bug or memory leak? I am 
> willing to try any suggestion, provide any other information and help 
> debugging.
> 
> For the moment, I am manually excluding from the transparent cache 
> schema the sites that seem to cause problems. I am also considering 
> adding a maxconn acl and lowering some connection timeouts. Waiting for 
> 180000 msec for a firewalled connection is *too* much.
> 
> Regards,
> Panagiotis

Looks like broken almost-HTTP/1.1 clients to me. The behaviour when a
client sends "Expect: 100-continue" without properly supporting it
themselves. Squid does not fully support that feature of HTTP/1.1, just
enough to send back the right 417 error and negotiate its non-existence
with the client.

What I think is happening is that many requests come in using the Expect:
header for HTTP/1.1 (which Squid does not yet support) the clients then
wait for the 100 status code to return. Which will never happen.
Standards-compliant Squid will reply with 417 errors and hope the client
retries without Expect:.

Check that your cache01 is configured with "ignore_expect_100 off". This
will send back the 417 error immediately for any clients which are actually
working to negotiate a faster response back. Broken clients will die
immediately with the error instead of dying after holding your resources
for that timeout.

Amos