Re: Re: Antwort: Re: [squid-users] Memory and CPU usage squid-3.1.4

Martin.Pichlmaier@xxxxxxxxxxxxxxxxxxxxxxxxxxx · Thu, 1 Jul 2010 10:45:39 +0200

Hello list,

I just wanted to post the results with valgrind.
Unfortunately the memcheck thread needs so much CPU that I could not
put a high load on the squid as maximum only about 5-10 req/s.

# ./squid -v
Squid Cache: Version 3.1.3
configure options:  '--prefix=/appl' '--localstate=/var' 
'--with-filedescriptors=16384' '--enable-storeio=aufs' 
'--enable-auth=ntlm,basic' '--enable-external-acl-helpers=wbinfo_group' 
'--enable-icap-client' --enable-ltdl-convenience

also recompiled and tried with:
# squid -v
Squid Cache: Version 3.1.3
configure options:  '--prefix=/appl' '--localstate=/var' 
'--with-filedescriptors=16384' '--enable-storeio=aufs' 
'--enable-auth=ntlm,basic' '--enable-external-acl-helpers=wbinfo_group' 
'--enable-icap-client' '--with-valgrind-debug' 'CFLAGS=-g -O2' 
--enable-ltdl-convenience

I ran valgrind repeatedly with: "valgrind --leak-check=yes -v squid -N &" 
and found:

==24141== 3,311,957 bytes in 3,784 blocks are definitely lost in loss 
record 26 of 27
==24141==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
==24141==    by 0x5ABAA7: xmalloc (util.c:508)
==24141==    by 0x5AA35A: rfc1035MessageUnpack (rfc1035.c:433)
==24141==    by 0x4B15A7: idnsGrokReply(char const*, unsigned long) 
(dns_interna l.cc:939)
==24141==    by 0x4B22F0: idnsRead(int, void*) (dns_internal.cc:1178)
==24141==    by 0x4AC154: comm_select (comm_epoll.cc:308)
==24141==    by 0x5455AC: CommSelectEngine::checkEvents(int) 
(comm.cc:2682)
==24141==    by 0x4B712D: EventLoop::checkEngine(AsyncEngine*, bool) 
(EventLoop.cc:51)
==24141==    by 0x4B7282: EventLoop::runOnce() (EventLoop.cc:125)
==24141==    by 0x4B7377: EventLoop::run() (EventLoop.cc:95)
==24141==    by 0x4FB36C: SquidMain(int, char**) (main.cc:1379)
==24141==    by 0x4FB975: main (main.cc:1141)

I looked a bit in the source code but didn't really find what could cause 
this.

Sometimes DNS did not seem to loose mem but I found this instead:

==29780== 987,870 (987,046 direct, 824 indirect) bytes in 1,321 blocks are 
definitely lost in loss record 27 of 28
==29780==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
==29780==    by 0x5ABAA7: xmalloc (util.c:508)
==29780==    by 0x5ABBAB: xstrdup (util.c:756)
==29780==    by 0x4B3E15: errorTryLoadText(char const*, char const*, bool) 
(errorpage.cc:313)
==29780==    by 0x4B494F: ErrorState::BuildContent() (errorpage.cc:1007)
==29780==    by 0x4B551D: ErrorState::BuildHttpReply() (errorpage.cc:881)
==29780==    by 0x4B58E5: errorAppendEntry (errorpage.cc:432)
==29780==    by 0x51D656: store_client::callback(long, bool) 
(store_client.cc:164)
==29780==    by 0x51DA2F: store_client::scheduleMemRead() 
(store_client.cc:448)
==29780==    by 0x51E567: storeClientCopy2(StoreEntry*, store_client*) 
(store_client.cc:331)
==29780==    by 0x51E8D3: store_client::copy(StoreEntry*, StoreIOBuffer, 
void (*)(void*, StoreIOBuffer), void*) (store_client.cc:264)
==29780==    by 0x4A0D0E: clientReplyContext::doGetMoreData() 
(client_side_reply.cc:1675)

When running valgrind with 3.0.STABLE 23 I did not find similar lost 
blocks,
only some KB lost when initializing but 3.1 looses some KB as well at that 
point.

I monitored a squid3.0.STABLE25 and squid 3.1.3/3.1.4 over a longer period 
and
found out that both need more memory over time but 3.0 eventually does not 
grow.
3.1 continues to grow until CPU rises to nearly 100%; then the memory 
consumption
seem to stop.

Has someone an idea where the problem could be?

Martin

Marcus Kool <marcus.kool@xxxxxxxxxxxxxxx> wrote on 17.06.2010 16:15:09:

> Martin,
> 
> Valgrind is a memory leak detection tool.
> You need some developer skills to run it.
> 
> If you have a test environment with low load you may want
> to give it a try.
> - download the squid sources
> - run configure with CFLAGS="-g -O2"
> - run squid with valgrind
> - wait
> - kill squid with a TERM signal and look and the valgrind log file
> 
> Valgrind uses a lot of memory for its own administration and has
> a lot of CPU overhead, so reduce cache_mem to a small value like 32MB.
> 
> Most likely you will see many memory leaks because
> Squid does not free everything when it exits. This is normal.
> You need to look at repeated memory leaks; the leaks that
> occur often and file a bug report.
> 
> Please do not post the whole valgrind output to this list.
> 
> Marcus
> 
> 
> 
> Martin.Pichlmaier@xxxxxxxxxxxxxxxxxxxxxxxxxxx wrote:
> > Hello,
> > 
> > I just wanted to report back the last tests:
> > 
> > After the memory cache is filled to 100% the squid (3.1.4 or 3.1.3)
> > still needs more memory over time when under load, about 1-2 GB a day.
> > memory_pool off did not change anything, the process size still rises.
> > The high CPU usage seem to start when rising over a certain size limit
> > but I am not sure about that.
> > Example memory consuption of squid-3.1.4:
> > from 8.0 GB (4pm) to 8.4 GB (7pm) to 8.5 GB (4am next day) to 9,4 GB 
> > (2pm).
> > At night there is low load on the squid, maybe 20-50 req/s.
> > 3.1.3 behaves the same, so it does not seem to be related to the
> > "ctx enter/exit level" topic discussed in the last mails.
> > 
> > I am now reverting the proxies back to 3.0.STABLE25 but will keep one 
> > proxy
> > on 3.1.4 for testing.
> > Probably something in my setup causes squid to consume too much 
memory.
> > 
> > Amos, do you have another idea what might causes this and where to 
look,
> > for example which debug depth? I can do some tests and have the 
> > possibility
> > to slowly put this proxy under load and take it out of productive
> > environment afterwards...
> > 
> > Regards,
> > Martin
> > 
> > 
> > 
> > Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote on 15.06.2010 13:31:40:
> > 
> > <snip>
> >>> I am now checking with mem pools off on one of the proxies and 
report
> >>> later whether it changes anything.
> > <snip>
> >>> 2010/06/15 11:37:06| ctx: enter level 2059: '<lost>'
> >>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
> >>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
> > {
> >>> 2010/06/15 11:37:06| ctx: exit level 2060
> >>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
> >>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
> > {
> >>> 2010/06/15 11:37:06| ctx: exit level 2060
> >>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
> >>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
> > {
> >>> 2010/06/15 11:37:06| ctx: exit level 2060
> >>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
> >>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
> > {
> >>> 2010/06/15 11:37:06| ctx: exit level 2060
> >>> 2010/06/15 11:37:06| ctx: enter level 2060: '<lost>'
> >>> X:}0/06/15 11:37:06| WARNING: suspicious CR characters in HTTP 
header 
> > {
> >>> 2010/06/15 11:40:56| ctx: exit level 2060
> >>> 2010/06/15 11:40:56| ctx: enter level 2060: '<lost>'
> >>> 2010/06/15 11:40:56| ctx: enter level 2061: '<lost>'
> >>>
> >> Ouch. We've been wondering about these ctx loops. It is not something 
to 
> > 
> >> be terribly worried about, but can cause some "weird stuff" (yes that 
is 
> > 
> >> probably the best explanation).
> >>
> >> Thanks to your reminder, I've just had another look and found one 
more 
> >> in header processing. Hopefully that was it.
> >>
> >> Amos
> >> -- 
> >> Please be using
> >>    Current Stable Squid 2.7.STABLE9 or 3.1.4
> > 
> > 
> >