Re: squid cpu problem

a.afach@xxxxxxxxxxxxx · Mon, 12 May 2014 05:47:03 -0500

Dear Amos
i have 4 squid servers ( three with 2 intel Xeon processors ) with no 
problem

the problem only occurs on the forth Desktop server with AMD Phenom 2 X6 
1090T

all servers have 16G ram
as to benchmark testing tool ( memory, CPU , DISK ) i can see that the 
desktop server is faster and has faster Disks ( SSD ) , although the 
problem only occurs on this server

could be the problem that i need faster CPU Or RAM Bus or there is any 
problems with AMD ..??!!

i tried EXT2/4 and Reiserfs with the same problem ,should i try XFS ??

Thanks

On 2014-05-08 04:22, Amos Jeffries wrote:
On 8/05/2014 12:33 a.m., a.afach wrote:
Hi amos
as i see the problem is still occurring with other errors in GDB the 
CPU
still goes to 100%

The "problem" is that very big objects do exist and occasionally need 
to
be moved from memory to disk.

this it the GDB :

Loaded symbols for /lib64/libnss_db.so.2
0x000000000050a1c8 in linklistPush (L=0x9fb429e8, p=0x53e0be0) at
list.cc:47
47      list.cc: No such file or directory.
        in list.cc
(gdb) backtrace
#0  0x000000000050a1c8 in linklistPush (L=0x9fb429e8, p=0x53e0be0) at
list.cc:47
#1  0x0000000000594841 in UFSStoreState::write (this=0xb7775918,
    buf=0x723b7c70
"I\223\324\004\245\201\315\306\354P\276\372e\373\r\235\250\311\033\275P\333\344\323\211\354\275\200\362>A",
size=4096, aOffset=-1, free_func=
    0x50f220 <memNodeWriteComplete(void*)>) at ufs/store_io_ufs.cc:247
#2  0x00000000005436e0 in doPages (anEntry=<optimized out>) at
store_swapout.cc:160
<snip>

i tried to change config with no success
the problem occurs in peak times or when no load in random times.
how can i know if the problem is a hardware problem or squid ????

Neither and both.

 It is a "non-problem" in that storing a large object to disk in small
incremental bits is going to take a lot of CPU cycles. The nature of 
the
task itself causes large CPU usage.

 The Squid code doing this store is not great. It walks the linked-list
of memory blocks (N^2)/2 times during the store operation.
 Also, the version you are using does not distinguish between objects
stored for future use and objects being discarded immediately. They all
go to disk on their way through Squid. So there is no way to avoid it 
by
configuring storage of smaller objects.

 The hardware is not able to cope with that operation being done on the
size of objects you are proxying.

Amos

thanks

On 2014-04-05 03:37, Amos Jeffries wrote:
This looks like the CPU cycles are being consumed by walking one or 
more
very long lists of memory pieces and writing them to disk one by one.
Note the UFSStoreState::write parameter size=4096 in the backtrace 
for
how bit those memory pages are.

Which could happen if you cached a very big object in cache_mem and 
then
a random time later it needed swapping out to disk to free up memory.

It could also happen if Squid needed to suddenly swap out a large 
number
of smaller items to make memory space available for a large one which 
is
about to arrive.

So, have you configured Squid to allow very large objects (many MB or
GB) in memory storage?

Note these causes would not show up in the testing you mentioned 
unless
you had a very wide range of test object sizes being pumped randomly
through the proxy. A tool like web polygraph is best to test that
traffic behaviour accurately.

Amos

On 5/04/2014 1:59 a.m., a.afach wrote:
Dear all
i still have the CPU spikes even when i used
disable-strict-error-checking without using Cflags

this is the gdb backtrace while the CPU spikes

0x000000000051b348 in linklistPush (L=0x11853e188, p=0xce6d4300) at
list.cc:47
47          while (*L)
(gdb) backtrace
#0  0x000000000051b348 in linklistPush (L=0x11853e188, p=0xce6d4300) 
at
list.cc:47
#1  0x00000000005a70a1 in UFSStoreState::write (this=0xb3970e28,
    buf=0x11fe69ca0
"!v\253r[/\307\232G\b\375`\237:\213\256^\335\373{\241%\232\363\021\071>`\342\033\177a\202G\320{\323%\236K\342\243*\332\316\351\231=\360\370\313Ro=\317\262\243\315\027\351,\221\230\353Z\023\024q\"QSC\036\214:M\242{@\351m\020\337Cw_\214\216\304\226\265\a\375\031\211\243V\222T\320\016\227\312-\211Sz\326^\346\230\251\327\222\n\373I\032\341\303==U\214\277\264\244\205\b1\346S=\230\215\204\245\254>\312\223\066\336\230PpP\227\271\370\266;\362\226\242\036\225\235w\330\325\061\316{o_\364\021\062\351\376\062|\313\006`\357m\206FQ0\021\030C\224\004]\336\315\371\033h1\361\363\350d\366\066"...,

size=4096, aOffset=-1, free_func=0x5203b0 
<memNodeWriteComplete(void*)>)
    at ufs/store_io_ufs.cc:247
#2  0x0000000000554ca0 in doPages (anEntry=<optimized out>) at
store_swapout.cc:160
#3  StoreEntry::swapOut (this=0x372ca10) at store_swapout.cc:279
#4  0x000000000054c986 in StoreEntry::invokeHandlers 
(this=0x372ca10) at
store_client.cc:714
#5  0x00000000004dc1a7 in FwdState::complete (this=0xbb502b48) at
forward.cc:341
#6  0x00000000005579a5 in ServerStateData::completeForwarding
(this=0xf8030588) at Server.cc:239
#7  0x00000000005571bd in ServerStateData::serverComplete2
(this=0xf8030588) at Server.cc:207
#8  0x00000000004ff3dc in HttpStateData::processReplyBody
(this=0xf8030588) at http.cc:1382
#9  0x00000000004fd367 in HttpStateData::readReply (this=0xf8030588,
io=...) at http.cc:1161
#10 0x0000000000503156 in JobDialer<HttpStateData>::dial
(this=0xde75ca50, call=...) at base/AsyncJobCalls.h:175
#11 0x0000000000569ee4 in AsyncCall::make (this=0xde75ca20) at
AsyncCall.cc:34
#12 0x000000000056cb76 in AsyncCallQueue::fireNext (this=<optimized
out>) at AsyncCallQueue.cc:53
#13 0x000000000056ccf0 in AsyncCallQueue::fire (this=0x2586400) at
AsyncCallQueue.cc:39
#14 0x00000000004d385c in EventLoop::runOnce (this=0x7fffcb3518d0) 
at
EventLoop.cc:130
#15 0x00000000004d3938 in EventLoop::run (this=0x7fffcb3518d0) at
EventLoop.cc:94
#16 0x000000000051d35b in SquidMain (argc=<optimized out>,
argv=<optimized out>) at main.cc:1418
#17 0x000000000051dd83 in SquidMainSafe (argv=<optimized out>,
argc=<optimized out>) at main.cc:1176
#18 main (argc=<optimized out>, argv=<optimized out>) at 
main.cc:1168

any idea about what's causing the cpu spike

On 2014-03-31 16:34, Amos Jeffries wrote:
On 2014-04-01 02:10, a.afach wrote:
Dear Eliezer
these are the configure options ...
configure options:  '--prefix=/usr/local/squid-3.1.19'
'--sysconfdir=/etc' '--sysconfdir=/etc/squid' 
'--localstatedir=/var'
'--enable-auth=basic,digest,ntlm' 
'--enable-removal-policies=lru,heap'
'--enable-digest-auth-helpers=password'
'--enable-basic-auth-helpers=PAM,getpwnam,NCSA,MSNT'
'--enable-external-acl-helpers=ip_user,session,unix_group'
'--enable-ntlm-auth-helpers=fakeauth'
'--enable-ident-lookups--enable-useragent-log'
'--enable-cache-digests' '--enable-delay-pools' 
'--enable-referer-log'
'--enable-arp-acl' '--with-pthreads' '--with-large-files'
'--enable-htcp' '--enable-carp' '--enable-follow-x-forwarded-for'
'--enable-snmp' '--enable-ssl' '--enable-storeio=ufs,diskd,aufs'
'--enable-async-io' '--enable-linux-netfilter' '--enable-epoll'
'--with-squid=/usr/squid-3.1.19' '--disable-ipv6' '--with-aio'
'--with-aio-threads=128' 'build_alias=x86_64-pc-linux-gnu'
'host_alias=x86_64-pc-linux-gnu' 'CC=x86_64-pc-linux-gnu-gcc'
'CFLAGS=-O2 -pipe -m64 -mtune=generic' 'LDFLAGS=-Wl,-O1
-Wl,--as-needed' 'CXXFLAGS=' '--cache-file=/dev/null' '--srcdir=.'

Some more reasons to upgrade:
 * --disable-strict-error-checking avoids issues on Gentoo with 
-Werror
 * CFLAGS affects the C compiler, not the C++ compiler. C compiler 
is
only used by Squid-3 to build some libraries.
 * current verified stable Gentoo Squid version is 3.3.8.
 * updating aything on Gentoo involves rebuilding a surprising 
number
of components from scratch. So when you get a difference like this 
it
really could be anywhere. Including buried in the compiler itself -
your flags are possibly changing optimization levels and 
CPU-specific
assembly instructions used by it.

Amos