Re: squid cpu problem

a.afach@xxxxxxxxxxxxx · Sun, 06 Apr 2014 10:00:38 -0500

Dear Amos
we are using 2 GB of memory as cache_mem and 128 KB as max object size 
in memory

any way i decreased max object size in memory to 32 KB and still have 
the spikes
this is GDB to squid in 100% spike bit the error related to epoll is 
this the same error like the previous one

0x00007ffa7466e103 in epoll_wait () from /lib64/libc.so.6
(gdb) backtrace
#0  0x00007ffa7466e103 in epoll_wait () from /lib64/libc.so.6
#1  0x00000000004bdee2 in comm_select (msec=<optimized out>) at 
comm_epoll.cc:266
#2  0x000000000057606e in CommSelectEngine::checkEvents 
(this=<optimized out>, timeout=<optimized out>) at comm.cc:2688
#3  0x00000000004d35d4 in EventLoop::checkEngine (this=0x7ffff604d7a0, 
engine=0x7ffff604d810, primary=<optimized out>) at EventLoop.cc:50
#4  0x00000000004d3845 in EventLoop::runOnce (this=0x7ffff604d7a0) at 
EventLoop.cc:124
#5  0x00000000004d3938 in EventLoop::run (this=0x7ffff604d7a0) at 
EventLoop.cc:94
#6  0x000000000051d35b in FalconMain (argc=<optimized out>, 
argv=<optimized out>) at main.cc:1418
#7  0x000000000051dd83 in FalconMainSafe (argv=<optimized out>, 
argc=<optimized out>) at main.cc:1176
#8  main (argc=<optimized out>, argv=<optimized out>) at main.cc:1168

Regards
Ayham

On 2014-04-05 03:37, Amos Jeffries wrote:
This looks like the CPU cycles are being consumed by walking one or 
more
very long lists of memory pieces and writing them to disk one by one.
Note the UFSStoreState::write parameter size=4096 in the backtrace for
how bit those memory pages are.

Which could happen if you cached a very big object in cache_mem and 
then
a random time later it needed swapping out to disk to free up memory.

It could also happen if Squid needed to suddenly swap out a large 
number
of smaller items to make memory space available for a large one which 
is
about to arrive.

So, have you configured Squid to allow very large objects (many MB or
GB) in memory storage?

Note these causes would not show up in the testing you mentioned 
unless
you had a very wide range of test object sizes being pumped randomly
through the proxy. A tool like web polygraph is best to test that
traffic behaviour accurately.

Amos

On 5/04/2014 1:59 a.m., a.afach wrote:
Dear all
i still have the CPU spikes even when i used
disable-strict-error-checking without using Cflags

this is the gdb backtrace while the CPU spikes

0x000000000051b348 in linklistPush (L=0x11853e188, p=0xce6d4300) at
list.cc:47
47          while (*L)
(gdb) backtrace
#0  0x000000000051b348 in linklistPush (L=0x11853e188, p=0xce6d4300) 
at
list.cc:47
#1  0x00000000005a70a1 in UFSStoreState::write (this=0xb3970e28,
    buf=0x11fe69ca0
"!v\253r[/\307\232G\b\375`\237:\213\256^\335\373{\241%\232\363\021\071>`\342\033\177a\202G\320{\323%\236K\342\243*\332\316\351\231=\360\370\313Ro=\317\262\243\315\027\351,\221\230\353Z\023\024q\"QSC\036\214:M\242{@\351m\020\337Cw_\214\216\304\226\265\a\375\031\211\243V\222T\320\016\227\312-\211Sz\326^\346\230\251\327\222\n\373I\032\341\303==U\214\277\264\244\205\b1\346S=\230\215\204\245\254>\312\223\066\336\230PpP\227\271\370\266;\362\226\242\036\225\235w\330\325\061\316{o_\364\021\062\351\376\062|\313\006`\357m\206FQ0\021\030C\224\004]\336\315\371\033h1\361\363\350d\366\066"...,
size=4096, aOffset=-1, free_func=0x5203b0 
<memNodeWriteComplete(void*)>)
    at ufs/store_io_ufs.cc:247
#2  0x0000000000554ca0 in doPages (anEntry=<optimized out>) at
store_swapout.cc:160
#3  StoreEntry::swapOut (this=0x372ca10) at store_swapout.cc:279
#4  0x000000000054c986 in StoreEntry::invokeHandlers (this=0x372ca10) 
at
store_client.cc:714
#5  0x00000000004dc1a7 in FwdState::complete (this=0xbb502b48) at
forward.cc:341
#6  0x00000000005579a5 in ServerStateData::completeForwarding
(this=0xf8030588) at Server.cc:239
#7  0x00000000005571bd in ServerStateData::serverComplete2
(this=0xf8030588) at Server.cc:207
#8  0x00000000004ff3dc in HttpStateData::processReplyBody
(this=0xf8030588) at http.cc:1382
#9  0x00000000004fd367 in HttpStateData::readReply (this=0xf8030588,
io=...) at http.cc:1161
#10 0x0000000000503156 in JobDialer<HttpStateData>::dial
(this=0xde75ca50, call=...) at base/AsyncJobCalls.h:175
#11 0x0000000000569ee4 in AsyncCall::make (this=0xde75ca20) at
AsyncCall.cc:34
#12 0x000000000056cb76 in AsyncCallQueue::fireNext (this=<optimized
out>) at AsyncCallQueue.cc:53
#13 0x000000000056ccf0 in AsyncCallQueue::fire (this=0x2586400) at
AsyncCallQueue.cc:39
#14 0x00000000004d385c in EventLoop::runOnce (this=0x7fffcb3518d0) at
EventLoop.cc:130
#15 0x00000000004d3938 in EventLoop::run (this=0x7fffcb3518d0) at
EventLoop.cc:94
#16 0x000000000051d35b in SquidMain (argc=<optimized out>,
argv=<optimized out>) at main.cc:1418
#17 0x000000000051dd83 in SquidMainSafe (argv=<optimized out>,
argc=<optimized out>) at main.cc:1176
#18 main (argc=<optimized out>, argv=<optimized out>) at main.cc:1168

any idea about what's causing the cpu spike

On 2014-03-31 16:34, Amos Jeffries wrote:
On 2014-04-01 02:10, a.afach wrote:
Dear Eliezer
these are the configure options ...
configure options:  '--prefix=/usr/local/squid-3.1.19'
'--sysconfdir=/etc' '--sysconfdir=/etc/squid' 
'--localstatedir=/var'
'--enable-auth=basic,digest,ntlm' 
'--enable-removal-policies=lru,heap'
'--enable-digest-auth-helpers=password'
'--enable-basic-auth-helpers=PAM,getpwnam,NCSA,MSNT'
'--enable-external-acl-helpers=ip_user,session,unix_group'
'--enable-ntlm-auth-helpers=fakeauth'
'--enable-ident-lookups--enable-useragent-log'
'--enable-cache-digests' '--enable-delay-pools' 
'--enable-referer-log'
'--enable-arp-acl' '--with-pthreads' '--with-large-files'
'--enable-htcp' '--enable-carp' '--enable-follow-x-forwarded-for'
'--enable-snmp' '--enable-ssl' '--enable-storeio=ufs,diskd,aufs'
'--enable-async-io' '--enable-linux-netfilter' '--enable-epoll'
'--with-squid=/usr/squid-3.1.19' '--disable-ipv6' '--with-aio'
'--with-aio-threads=128' 'build_alias=x86_64-pc-linux-gnu'
'host_alias=x86_64-pc-linux-gnu' 'CC=x86_64-pc-linux-gnu-gcc'
'CFLAGS=-O2 -pipe -m64 -mtune=generic' 'LDFLAGS=-Wl,-O1
-Wl,--as-needed' 'CXXFLAGS=' '--cache-file=/dev/null' '--srcdir=.'

Some more reasons to upgrade:
 * --disable-strict-error-checking avoids issues on Gentoo with 
-Werror
 * CFLAGS affects the C compiler, not the C++ compiler. C compiler 
is
only used by Squid-3 to build some libraries.
 * current verified stable Gentoo Squid version is 3.3.8.
 * updating aything on Gentoo involves rebuilding a surprising 
number
of components from scratch. So when you get a difference like this 
it
really could be anywhere. Including buried in the compiler itself -
your flags are possibly changing optimization levels and 
CPU-specific
assembly instructions used by it.

Amos