Re: squid "stops working" several times a day

Marcus Kool <marcus.kool@xxxxxxxxxxxxxxx> · Wed, 29 Feb 2012 12:22:44 -0300

the reads have many errors, about 25% and your other email shows the error code EAGAIN.
All calls to connect() fail and all calls to recfrom() fail.

Its seems to me that the system has a resource problem.
Running out of file descriptor or other system resources.

I suggest to look at /var/log/messages and strace again and
find out why connect, recvfrom and accept fails.

Marcus

karj wrote:
The output of the stace at a normal time 

Process 18021 attached - interrupt to quit
^CProcess 18021 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 40.42    0.078685           1    131179        16 write
 23.38    0.045513           0    125618     32228 read
 11.07    0.021552           0     90238           epoll_ctl
  6.81    0.013266           0     64968           fcntl
  4.64    0.009038           1     12177           open
  3.95    0.007696           0     22260           close
  3.92    0.007640           0     24285           stat
  1.82    0.003541           0     19901           lseek
  1.82    0.003536           0     12021      2105 accept
  0.98    0.001913           0      4169           epoll_wait
  0.74    0.001440           0      9916           getsockname
  0.27    0.000533           0      8244           fstat
  0.14    0.000265           1       245       245 recvfrom
  0.02    0.000046           0       240       240 connect
  0.00    0.000000           0        12           brk
  0.00    0.000000           0       240           socket
  0.00    0.000000           0         3           sendto
  0.00    0.000000           0       240           bind
  0.00    0.000000           0       240           setsockopt
  0.00    0.000000           0       239           getsockopt
  0.00    0.000000           0         1           getrusage
  0.00    0.000000           0        15           getdents64
------ ----------- ----------- --------- --------- ----------------
100.00    0.194664                526451     34834 total

Thanks again

-----Original Message-----
From: karj [mailto:gkaragiannidis@xxxxxxxxx] 
Sent: Τετάρτη, 29 Φεβρουαρίου 2012 3:40 μμ
To: 'Sebastian Muniz'; squid-users@xxxxxxxxxxxxxxx
Subject: RE:  squid "stops working" several times a day

I 'm able to ping the machines
The one thing that I observed is that
by the time of crisis squid process is using 100% of the CPU.
That's happening to every server which has the problem...
I 've tried to use strace but I've got no success since the strace output is
huge.
What else can I do to identify the problem.?

At the time of problem seems from cache.log that squid loses connectivity
with almost everybidy

2012/02/29 09:15:51| TCP connection to assets_servers (xxx.xxx.xxx.xxx:80)
failed
2012/02/29 09:15:51| TCP connection to typos_servers (xxx.xxx.xxx.xxx:80)
failed
2012/02/29 09:15:51| Detected DEAD Parent: tityros_servers
2012/02/29 09:15:51| TCP connection to assets_servers (xxx.xxx.xxx.xxx:80)
failed
2012/02/29 09:15:51| TCP connection to tityros_servers (xxx.xxx.xxx.xxx:80)
failed
2012/02/29 09:15:51| TCP connection to typos_servers (xxx.xxx.xxx.xxx:80)
failed

From another sibling log at the same time
2012/02/29 09:15:51| Detected DEAD Sibling: xxx.xxx.xxx.xxx

Thanks in advance
Yiannis

-----Original Message-----
From: Sebastian Muniz [mailto:basurerosebita@xxxxxxxxx]
Sent: Τρίτη, 28 Φεβρουαρίου 2012 11:52 μμ
To: squid-users@xxxxxxxxxxxxxxx
Subject: Re:  squid "stops working" several times a day

On 2/28/2012 2:54 PM, karj wrote:
Hi All,
I have a problem with my squid's.
Squid "stops working" several times a day.
The only thing that warns me that something is wrong in cache.log is 
the "Detected DEAD Sibling: xxx.xx.xxx.xxx" message.
After a few seconds everything goes back to normal.
We are using 5 squids version (2.7.Stable 9) in Accelerator Mode which 
are sibling to each other.
So we have 5 sibling squid in front of our web farms. Serving almost 
7000/request per second at peak time, and an average of 4500/request 
per second.
The problem occurs randomly in all servers...
Are you able to reach (telnet or ping or anything) the sibling during the
times that squid stops working?
What can you tell about the sibling logs? Specially the cache.log

Regards
Sebastian