Re: Squid 3.3 is very aggressive with memory

Eliezer Croitoru <eliezer@xxxxxxxxxxxx> · Mon, 23 Dec 2013 22:21:52 +0200

On 22/12/13 09:37, Nathan Hoad wrote:
Sure thing. I've put the squid.conf and info manager page onto my
server, to save sending everyone a very large email.

anonymised+trimmed squid.conf: http://getoffmalawn.com/static/squid.conf
info manager page: http://getoffmalawn.com/static/squid-manager-info.out

[root@host ~]# free -m
              total       used       free     shared    buffers     cached
Mem:          3040       2916        124          0         33        355
-/+ buffers/cache:       2526        514
Swap:          645        342        302

OK so it seems like the basic issue is probably not in the level of FD 
or something similar.
I do not know why the usage of ram is so high but the basic test I would 
do is:
remove any cache_dir from squid
remove any memory cache from  squid
make sure that the swap is at 0 state.
make sure what process do run on this machine..

I have a tiny proxy with 1GB ram and it works with no problems that I 
know of yet with 3.4.X.

Since it's a 32bit machine and has couple things I have a spec file that 
worked fine for it on the latest build of squid.

If you do not use authentication remove any of the non-relevant build 
instructions..

Also if you can get into a position where all of your machine details 
are known such as "what each process uses??" you can answer yourself more.

ps avx -L |grep squid
can give more but if you do have top output for all squid process and 
subprocess it will be nice.
as you can see these processes do use an amount of ram but if make sure 
that the ram is "slipping" from one way or another and in not accounted 
in top we have a more solid ground on that.

What would "http://squid_ip:port/squid-internal-mgr/mem";
Would give us?

it has a lot of details that can be analyzed..

Can you by any chance describe the size of the network?
We are not talking about a home network..
29 clients? ip addresses?

I would try to run the proxy with no cache at all for about 24\48 hours 
to see if there is a peek in the FD usage or any load on the service 
which can describe what you are talking about.

There is always the possibility for a memory leak and it's not always 
related directly to squid..

What are the network requirements? Are you using WCCP(for example)?

All The Bests,
Eliezer

For reference, from top:

17715 squid     15   0  788m 766m 5528 D 19.6 25.2  18:12.23 (squid-1)
-f /etc/squid/squid.conf
13183 squid     15   0  624m 600m 5444 S  0.0 19.7  14:57.00 (squid-1)
-f /etc/squid/squid.conf2  # second instance, serving transparent
traffic only, currently inactive

Note that the manager, free and top output are not free my most recent
run (i.e. they will not match the cache.log I'll mention later). There
was a bit of an emergency with the last run with full logging, that
required I kill everything before I had time to gather these pieces of
information. Running out of RAM does that...

Any directive mentioned in the squid.conf can be disabled, if it will
help isolation of the leak. I'm also happy to rebuild Squid with any
other parameters that may be useful. I'm currently building via an RPM
spec file on a CentOS 5 box, with gcc 4.1.2 (ancient, I know!) and the
following options:

Squid Cache: Version 3.3.11
configure options:  '--build=i686-redhat-linux-gnu'
'--host=i686-redhat-linux-gnu' '--target=i586-redhat-linux-gnu'
'--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr'
'--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc'
'--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib'
'--libexecdir=/usr/libexec' '--sharedstatedir=/usr/com'
'--mandir=/usr/share/man' '--infodir=/usr/share/info'
'--exec_prefix=/usr' '--libexecdir=/usr/lib/squid'
'--localstatedir=/var' '--datadir=/usr/share/squid'
'--sysconfdir=/etc/squid' '--with-logdir=$(localstatedir)/log/squid'
'--with-pidfile=$(localstatedir)/run/squid.pid'
'--disable-dependency-tracking' '--enable-arp-acl'
'--enable-follow-x-forwarded-for' '--enable-auth'
'--enable-auth-basic=DB,LDAP,MSNT,MSNT-multi-domain,NCSA,NIS,PAM,POP3,RADIUS,SASL,SMB,getpwnam'
'--enable-auth-ntlm=smb_lm,fake'
'--enable-auth-digest=file,LDAP,eDirectory' '--enable-auth-negotiate'
'--enable-external-acl-helpers=ip_user,ldap_group,session,unix_group,wbinfo_group'
'--enable-cache-digests' '--enable-cachemgr-hostname=localhost'
'--enable-delay-pools' '--enable-epoll' '--enable-icap-client'
'--enable-ident-lookups' '--with-large-files'
'--enable-linux-netfilter' '--enable-referer-log'
'--enable-removal-policies=heap,lru' '--enable-snmp' '--enable-ssl'
'--enable-ssl-crtd' '--enable-storeio=aufs,diskd,ufs'
'--enable-useragent-log' '--enable-wccpv2' '--with-aio'
'--with-default-user=squid' '--with-dl' '--with-openssl'
'--with-pthreads' '--disable-ipv6' '--disable-loadable-modules'
'--disable-eui' 'build_alias=i686-redhat-linux-gnu'
'host_alias=i686-redhat-linux-gnu'
'target_alias=i586-redhat-linux-gnu' 'CFLAGS=-O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m32 -march=i586
-fasynchronous-unwind-tables -fpie' 'LIBS=-lpresenceclient
-L/usr/local/lib' 'CXXFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32
-march=i586 -fasynchronous-unwind-tables -fpie'
'PKG_CONFIG_PATH=/usr/lib/pkgconfig:/usr/share/pkgconfig'
--enable-ltdl-convenience

Thanks,

Nathan.

On Wed, Dec 18, 2013 at 11:38 AM, Eliezer Croitoru <eliezer@xxxxxxxxxxxx> wrote:
OK Nathan,

The next steps are squid.conf..
Which can clarify couple things.
Also you do have the cache-mgr interface in http and in it you have
statistics.
http://proxy_ip:3128/squid-internal-mgr/info
(The above example).
It will provide much more data then just like that looking at memory usage.
Also please provide "free -m" output.

Thanks,
Eliezer

On 17/12/13 07:24, Nathan Hoad wrote:

Okay, to follow up. I still cannot reproduce this in a lab
environment, but I have implemented a way of doing what Alex described
on the production machine. I run two instances of Squid with the same
config and switch the transparent proxy out by changing the redirect
rules in iptables. The second instance is running without a cache_dir
though, to prevent the possibility of two instances sharing the same
directory and running amok. If requested, I can create a second
cache_dir for the second instance to mimic the config entirely.

While running under this configuration, I've confirmed that memory
usage does go up when active, and stays at that level when inactive,
allowing some time for timeouts and whatnot. I'm currently switching
between the two instances every fifteen minutes.

Here is a link to the memory graph for the entire running time of the
second process, at 1 minute intervals:
http://getoffmalawn.com/static/mem-graph.png. The graph shows memory
use steadily increasing during activity, but remaining reasonably
stable during inactivity.

Where shall we go from here? Given that I can switch between the
instances, impacting performance on the production box is not of huge
concern now, so I can run the second instance under Valgrind, or bump
up the debug logging, or whatever would be helpful.

As an aside, I've been reading some of the code pointed at by traces
I've got, and I've stumbled upon the fact that nearly every caller of
StoreEntry::replaceHttpReply will leak HttpReply objects if the
internal mem_obj pointer of a StoreEntry is set to NULL. There's a
critical log message that occurs in this situation which I have not
seen, so I can conclude that this is not the issue I am seeing, but
it's an issue nonetheless. If there's interest, I'll submit a patch
for this issue.

Many thanks,

Nathan.
--
Nathan Hoad
Software Developer
www.getoffmalawn.com

On Sat, Dec 14, 2013 at 8:11 PM, Nathan Hoad <nathan@xxxxxxxxxxxxxxxx>
wrote:

On Fri, Dec 13, 2013 at 10:33 PM, Eliezer Croitoru <eliezer@xxxxxxxxxxxx>
wrote:

Hey Nathan,

I am looking for more details on the subject in hand in the shape of:
Networking Hardware

Straight out of lspci:

02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722
Gigabit Ethernet PCI Express
03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703
Gigabit Ethernet (rev 10)

Two network cards - one for internal traffic, the other for external.

Testing Methods

   - a mixture of direct and intercepted HTTP and HTTPS traffic, hitting
the configured ICAP server and not.
   - both valid and invalid upstream SSL certificates, hundreds of
concurrent requests from a single client
   - thrashing Squid with thousands of connections that are aborted
after 800ms, running for ~30-40 seconds at a time.
   - currently I'm putting the week's access.log through Squid to see if
that triggers it, for a poor approximation of the traffic.

Is it a SMP squid setup?

   - both SMP (2 workers) and non-SMP.

In the case you use a 32bit system which is limited to how much ram??(I
remember something about a windows nt with 64GB).

   - This particular host has 3gb of RAM. Previously running a non-SMP
Squid 3.2.13 instance and according to logs, maxed out at ~500mb of
resident after running for hours or days at a time, with a 220mb
cache_mem. Now, however the memory usage grows to 900mb in ~40
minutes, and typically reaches 1.5gb in ~4 hours. We have a ulimit in
place to kill it once it hits 1.5gb, but prior to putting that in
place it typically reached 2gb.

If you can provide more details I will be happy to try and test it.

Thanks,
Eliezer

If there's any other information you think may be useful, feel free to
ask.