Re: Antwort: Re: [squid-users] Memory and CPU usage squid-3.1.4

Marcus Kool <marcus.kool@xxxxxxxxxxxxxxx> · Mon, 05 Jul 2010 21:57:08 -0300

The memory leak is caused by idnsGrokReply: the caller
of rfc1035MessageUnpack should free the memory using
rfc1035MessageDestroy.

The function idnsGrokReply has several changes between
3.0.x and 3.1.x.
It is obvious that not all paths call rfc1035MessageDestroy
but I do not know this code and feel uncomfortable
making a patch, so I suggest that the person who maintains
this part of the code looks at it.

Marcus

Martin.Pichlmaier@xxxxxxxxxxxxxxxxxxxxxxxxxxx wrote:
Hello list,

I just wanted to post the results with valgrind.
Unfortunately the memcheck thread needs so much CPU that I could not
put a high load on the squid as maximum only about 5-10 req/s.

# ./squid -v
Squid Cache: Version 3.1.3
configure options:  '--prefix=/appl' '--localstate=/var' 
'--with-filedescriptors=16384' '--enable-storeio=aufs' 
'--enable-auth=ntlm,basic' '--enable-external-acl-helpers=wbinfo_group' 
'--enable-icap-client' --enable-ltdl-convenience

also recompiled and tried with:
# squid -v
Squid Cache: Version 3.1.3
configure options:  '--prefix=/appl' '--localstate=/var' 
'--with-filedescriptors=16384' '--enable-storeio=aufs' 
'--enable-auth=ntlm,basic' '--enable-external-acl-helpers=wbinfo_group' 
'--enable-icap-client' '--with-valgrind-debug' 'CFLAGS=-g -O2' 
--enable-ltdl-convenience

I ran valgrind repeatedly with: "valgrind --leak-check=yes -v squid -N &" 
and found:

==24141== 3,311,957 bytes in 3,784 blocks are definitely lost in loss 
record 26 of 27
==24141==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
==24141==    by 0x5ABAA7: xmalloc (util.c:508)
==24141==    by 0x5AA35A: rfc1035MessageUnpack (rfc1035.c:433)
==24141==    by 0x4B15A7: idnsGrokReply(char const*, unsigned long) 
(dns_interna l.cc:939)
==24141==    by 0x4B22F0: idnsRead(int, void*) (dns_internal.cc:1178)
==24141==    by 0x4AC154: comm_select (comm_epoll.cc:308)
==24141==    by 0x5455AC: CommSelectEngine::checkEvents(int) 
(comm.cc:2682)
==24141==    by 0x4B712D: EventLoop::checkEngine(AsyncEngine*, bool) 
(EventLoop.cc:51)
==24141==    by 0x4B7282: EventLoop::runOnce() (EventLoop.cc:125)
==24141==    by 0x4B7377: EventLoop::run() (EventLoop.cc:95)
==24141==    by 0x4FB36C: SquidMain(int, char**) (main.cc:1379)
==24141==    by 0x4FB975: main (main.cc:1141)

I looked a bit in the source code but didn't really find what could cause 
this.

Sometimes DNS did not seem to loose mem but I found this instead:

==29780== 987,870 (987,046 direct, 824 indirect) bytes in 1,321 blocks are 
definitely lost in loss record 27 of 28
==29780==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
==29780==    by 0x5ABAA7: xmalloc (util.c:508)
==29780==    by 0x5ABBAB: xstrdup (util.c:756)
==29780==    by 0x4B3E15: errorTryLoadText(char const*, char const*, bool) 
(errorpage.cc:313)
==29780==    by 0x4B494F: ErrorState::BuildContent() (errorpage.cc:1007)
==29780==    by 0x4B551D: ErrorState::BuildHttpReply() (errorpage.cc:881)
==29780==    by 0x4B58E5: errorAppendEntry (errorpage.cc:432)
==29780==    by 0x51D656: store_client::callback(long, bool) 
(store_client.cc:164)
==29780==    by 0x51DA2F: store_client::scheduleMemRead() 
(store_client.cc:448)
==29780==    by 0x51E567: storeClientCopy2(StoreEntry*, store_client*) 
(store_client.cc:331)
==29780==    by 0x51E8D3: store_client::copy(StoreEntry*, StoreIOBuffer, 
void (*)(void*, StoreIOBuffer), void*) (store_client.cc:264)
==29780==    by 0x4A0D0E: clientReplyContext::doGetMoreData() 
(client_side_reply.cc:1675)

When running valgrind with 3.0.STABLE 23 I did not find similar lost 
blocks,
only some KB lost when initializing but 3.1 looses some KB as well at that 
point.

I monitored a squid3.0.STABLE25 and squid 3.1.3/3.1.4 over a longer period 
and
found out that both need more memory over time but 3.0 eventually does not 
grow.
3.1 continues to grow until CPU rises to nearly 100%; then the memory 
consumption
seem to stop.

Has someone an idea where the problem could be?

Martin

Marcus Kool <marcus.kool@xxxxxxxxxxxxxxx> wrote on 17.06.2010 16:15:09:

Martin,

Valgrind is a memory leak detection tool.
You need some developer skills to run it.

If you have a test environment with low load you may want
to give it a try.
- download the squid sources
- run configure with CFLAGS="-g -O2"
- run squid with valgrind
- wait
- kill squid with a TERM signal and look and the valgrind log file

Valgrind uses a lot of memory for its own administration and has
a lot of CPU overhead, so reduce cache_mem to a small value like 32MB.

Most likely you will see many memory leaks because
Squid does not free everything when it exits. This is normal.
You need to look at repeated memory leaks; the leaks that
occur often and file a bug report.

Please do not post the whole valgrind output to this list.

Marcus

Martin.Pichlmaier@xxxxxxxxxxxxxxxxxxxxxxxxxxx wrote:
Hello,

I just wanted to report back the last tests:

After the memory cache is filled to 100% the squid (3.1.4 or 3.1.3)
still needs more memory over time when under load, about 1-2 GB a day.
memory_pool off did not change anything, the process size still rises.
The high CPU usage seem to start when rising over a certain size limit
but I am not sure about that.
Example memory consuption of squid-3.1.4:
from 8.0 GB (4pm) to 8.4 GB (7pm) to 8.5 GB (4am next day) to 9,4 GB 
(2pm).
At night there is low load on the squid, maybe 20-50 req/s.
3.1.3 behaves the same, so it does not seem to be related to the
"ctx enter/exit level" topic discussed in the last mails.

I am now reverting the proxies back to 3.0.STABLE25 but will keep one 
proxy
on 3.1.4 for testing.
Probably something in my setup causes squid to consume too much 
memory.
Amos, do you have another idea what might causes this and where to 
look,
for example which debug depth? I can do some tests and have the 
possibility
to slowly put this proxy under load and take it out of productive
environment afterwards...

Regards,
Martin

Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote on 15.06.2010 13:31:40:

<snip>
I am now checking with mem pools off on one of the proxies and 
report
later whether it changes anything.
<snip>
2010/06/15 11:37:06| ctx: enter level 2059: '<lost>'
2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
{
2010/06/15 11:37:06| ctx: exit level 2060
2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
{
2010/06/15 11:37:06| ctx: exit level 2060
2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
{
2010/06/15 11:37:06| ctx: exit level 2060
2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
{
2010/06/15 11:37:06| ctx: exit level 2060
2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
{
2010/06/15 11:40:56| ctx: exit level 2060
2010/06/15 11:40:56| ctx: enter level 2060: '<lost>'
2010/06/15 11:40:56| ctx: enter level 2061: '<lost>'

Ouch. We've been wondering about these ctx loops. It is not something 
to 
be terribly worried about, but can cause some "weird stuff" (yes that 
is 
probably the best explanation).

Thanks to your reminder, I've just had another look and found one 
more 
in header processing. Hopefully that was it.

Amos
--
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.4