Search squid archive

Re: Squid losing connectivity for 30 seconds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




 Hi,

I am currently facing a problem that I wasn't able to find a solution for in the mailing list or on the internet, My squid is dying for 30 seconds every one hour at the same exact time, squid process will still be running, I lose my wccp connectivity, the cache peers detect the squid as a dead sibling, and the squid cannot server any requests The network connectivity of the sever is not affected (a ping to the squid's ip doesn't timeout)

The problem doesn't start immediately when the squid is installed on the server (The server is dedicated as a squid)
It starts when the cache directories starts to fill up,
I have started my setup with 10 cache directors, the squid will start having the problem when the cache directories are above 50% filled when i change the number of cache directory (9,8,...) the squid works for a while then the same problem
cache_dir aufs /cache1/squid 90000 140 256
cache_dir aufs /cache2/squid 90000 140 256
cache_dir aufs /cache3/squid 90000 140 256
cache_dir aufs /cache4/squid 90000 140 256
cache_dir aufs /cache5/squid 90000 140 256
cache_dir aufs /cache6/squid 90000 140 256
cache_dir aufs /cache7/squid 90000 140 256
cache_dir aufs /cache8/squid 90000 140 256
cache_dir aufs /cache9/squid 90000 140 256
cache_dir aufs /cache10/squid 80000 140 256

I have 1 terabyte of storage
Finally I created two cache dircetories (One on each HDD) but the problem persisted

You have 2 HDD?  but, but, you have 10 cache_dir.
We repeatedly say "one cache_dir per disk" or similar. In particular one cache_dir per physical drive spindle (for "disks" made up of multiple physical spindles) wherever possible with physical drives/spindles mounting separately to ensure the pairing. Squid performs a very unusual pattern of disk I/O which stress them down to the hardware controller level and make this kind of detail critical for anything like good speed. Avoiding cache_dir object limitations by adding more UFS-based dirs to one disk does not improve the situation.

That is a problem which will be affecting your Squid all the time though, possibly making the source of the pause worse.

From teh description I believe it is garbage collection on the cache directories. The pauses can be visible when garbage collecting any caches over a few dozen GB. The squid default "swap_high" and "swap_low" values are "5" apart, with at minimum being a value of 0 apart. These are whole % points of the total cache size, being erased from disk in a somewhat random-access style across the cache area. I did mention uncommon disk I/O patterns, right?

To be sure what it is, you can use the "strace" tool to the squid worker process (the second PID in current stable Squids) and see what is running. But given the hourly regularity and past experience with others on similar cache sizes, I'm almost certain its the garbage collection.

Amos


Hi Amos,

Thank you for your fast reply,
I have 2 HDD (450GB and 600GB)
df -h displays that i have 357Gb and 505GB available
In my last test, my cache dir where:
cache_swap_low 90
cache_swap_high 95

This is not. For anything more than 10-20 GB I recommend setting it to no more than 1 apart, possibly the same value if that works. Squid has a light but CPU-intensive and possibly long garbage removal cycle above cache_swap_low, and a much more aggressive but faster and less CPU intensive removal above cache_swap_high. On large caches it is better in terms of downtime going straight to the aggressive removal and clearing disk space fast, despite the bandwidth cost replacing any items the light removal would have left.


Amos

Hi Amos,

I have changed the swap_high 90 and swap_low 90 with two cache dir (one for each HDD), i still have the same problem,
I did an strace (when the problem occured)
------ ----------- ----------- --------- --------- ----------------
 23.06    0.004769           0     85681        96 write
 21.07    0.004359           0     24658         5 futex
 19.34    0.004001         800         5           open
  6.54    0.001352           0      5101      5101 connect
  6.46    0.001337           3       491           epoll_wait
  5.34    0.001104           0     51938      9453 read
  3.90    0.000806           0     39727           close
  3.54    0.000733           0     86400           epoll_ctl
  3.54    0.000732           0     32357           sendto
  2.02    0.000417           0     56721           recvmsg
  1.84    0.000381           0     24064           socket
  0.96    0.000199           0     56264           fcntl
  0.77    0.000159           0      6366       329 accept
  0.53    0.000109           0     24033           bind
  0.52    0.000108           0     30085           getsockname
  0.21    0.000044           0     11200           stat
  0.21    0.000044           0      6998       359 recvfrom
  0.09    0.000019           0      5085           getsockopt
  0.06    0.000012           0      2887           lseek
  0.00    0.000000           0        98           brk
  0.00    0.000000           0        16           dup2
  0.00    0.000000           0     10314           setsockopt
  0.00    0.000000           0         4           getdents
  0.00    0.000000           0         3           getrusage
------ ----------- ----------- --------- --------- ----------------
100.00    0.020685                560496     15343 total

this is the strace of squid when it is working normally:
------ ----------- ----------- --------- --------- ----------------
 24.88    0.015887           0    455793       169 write
 13.72    0.008764           0    112185           epoll_wait
 11.67    0.007454           0    256256     27158 read
  8.47    0.005408           0    169133           sendto
  6.94    0.004430           0    159596           close
  6.85    0.004373           0    387359           epoll_ctl
  6.42    0.004102           0     19651     19651 connect
  5.54    0.003538           0    290289           recvmsg
  3.81    0.002431           0    116515           socket
  3.53    0.002254           0    164750           futex
  1.68    0.001075           0    207688           fcntl
  1.53    0.000974           0     95228     23139 recvfrom
  1.29    0.000821           0     33408     12259 accept
  1.14    0.000726           0     46582           stat
  1.11    0.000707           0    110826           bind
  0.85    0.000544           0    137574           getsockname
  0.32    0.000204           0     21642           getsockopt
  0.26    0.000165           0     39502           setsockopt
  0.01    0.000007           0      8092           lseek
  0.00    0.000000           0       248           open
  0.00    0.000000           0         4           brk
  0.00    0.000000           0        88           dup2
  0.00    0.000000           0        14           getdents
  0.00    0.000000           0         6           getrusage
------ ----------- ----------- --------- --------- ----------------
100.00    0.063864               2832429     82376 total

Do you have any suggestions to solve the issue, can I run the garbage collector more frequently, is it better to change the cache_dir type from aufs to something else?
Do you see the problem in the strace?

Thank you,
Elie


Hi,

Please note that squid is facing the same problem even when their is no activity or any clients connected to it

Regards
Elie

Hi,

here is the strace result
-----------------------------------------------------------------------------------------------------
connect(7595, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("2.22.50.75")}, 16) = -1 EINPROGRESS (Operation now in progress) epoll_ctl(6, EPOLL_CTL_ADD, 7595, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=7595, u64=13133095884688465323}}) = 0 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\375\254\21\0002r\316N\0\0\0\0002r\316N\0\0\0\0\262\245\257P\0\0\0\0"..., 72) = 72 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\306\254\21\0000r\316N\0\0\0\0000r\316N\0\0\0\0\377\377\377\377\377\377\377\377"..., 72) = 72 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\262\251\21\0\360\376\315N\0\0\0\0002r\316N\0\0\0\0\377\377\377\377\377\377\377\377"..., 72) = 72 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\2\255\21\0002r\316N\0\0\0\0003r\316N\0\0\0\0\321\376\316N\0\0\0\0"..., 72) = 72 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\3\255\21\0002r\316N\0\0\0\0003r\316N\0\0\0\0002\347\340N\0\0\0\0"..., 72) = 72 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\371\254\21\0002r\316N\0\0\0\0002r\316N\0\0\0\0\377\377\377\377\377\377\377\377"..., 72) = 72 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\n\255\21\0003r\316N\0\0\0\0003r\316N\0\0\0\0\263\245\257P\0\0\0\0"..., 72) = 72 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\v\255\21\0003r\316N\0\0\0\0003r\316N\0\0\0\0\323\370\317N\0\0\0\0"..., 72) = 72 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\f\255\21\0003r\316N\0\0\0\0003r\316N\0\0\0\0003\377\365N\0\0\0\0"..., 72) = 72 futex(0x9f65ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x9f65e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x9f65c0, FUTEX_WAKE_PRIVATE, 1)  = 1
write(35, "\2\0\0\0\334\251\21\0@\\\316N\0\0\0\0003r\316N\0\0\0\0\377\377\377\377\377\377\377\377"..., 72) = 72
read(165, "!", 256)                     = 1
close(379)                              = 0
epoll_ctl(6, EPOLL_CTL_DEL, 9525, {0, {u32=9525, u64=25160128873375029}}) = 0 epoll_ctl(6, EPOLL_CTL_ADD, 9525, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=9525, u64=13133094175291483445}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 9525, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=9525, u64=23686740342482229}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 4554, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=4554, u64=549755818442}}) = 0
close(68)                               = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 68
fcntl(68, F_GETFD)                      = 0
fcntl(68, F_SETFD, FD_CLOEXEC)          = 0
setsockopt(68, SOL_IP, 0x13 /* IP_??? */, [1], 4) = 0
bind(68, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("95.141.62.17")}, 16) = 0
fcntl(68, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(68, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
setsockopt(68, SOL_TCP, TCP_NODELAY, [1], 4) = 0
getsockname(68, {sa_family=AF_INET, sin_port=htons(60303), sin_addr=inet_addr("95.141.62.17")}, [16]) = 0
socket(PF_NETLINK, SOCK_RAW, 0)         = 379
bind(379, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(379, {sa_family=AF_NETLINK, pid=11594, groups=00000000}, [12]) = 0 sendto(379, "\24\0\0\0\26\0\1\3t\232\324N\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20 recvmsg(379, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0t\232\324NJ-\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 108 recvmsg(379, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0t\232\324NJ-\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 128 recvmsg(379, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0t\232\324NJ-\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
close(379)                              = 0
connect(68, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("2.22.50.65")}, 16) = -1 EINPROGRESS (Operation now in progress) epoll_ctl(6, EPOLL_CTL_ADD, 68, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=68, u64=13133092173836714052}}) = 0 epoll_ctl(6, EPOLL_CTL_MOD, 3658, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=3658, u64=549755817546}}) = 0 sendto(10, "\33\221\1\0\0\1\0\0\0\0\0\0\nliveupdate\vgocyberl"..., 44, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 44 sendto(10, "\275\251\1\0\0\1\0\0\0\0\0\0\2tc\2v3\6cache3\1c\4pack"..., 48, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 48 sendto(10, "\267\355\1\0\0\1\0\0\0\0\0\0\3crl\6thawte\3com\0\0\34\0\1", 32, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 32 sendto(10, "\312\237\1\0\0\1\0\0\0\0\0\0\2p4\rbhzfrwddqtb66\20cg"..., 79, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 79 sendto(10, "\261\216\1\0\0\1\0\0\0\0\0\0\4apps\10facebook\3com\0\0"..., 35, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 35 sendto(10, "\237\10\1\0\0\1\0\0\0\0\0\0\6sn121w\6snt121\4mail\4"..., 45, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 45 sendto(10, "FR\1\0\0\1\0\0\0\0\0\0\3api\6js-kit\3com\0\0\1\0\1", 32, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 32 sendto(10, "\207$\1\0\0\1\0\0\0\0\0\0\1l\naddthiscdn\3com\0\0\1"..., 34, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 34 sendto(10, "\265#\1\0\0\1\0\0\0\0\0\0\5usage\7hosting\7toolb"..., 60, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 60 sendto(10, "hZ\1\0\0\1\0\0\0\0\0\0\3www\ntargetgulf\3com\0"..., 36, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 36 sendto(10, "\355\345\1\0\0\1\0\0\0\0\0\0\10services\4apps\7condu"..., 43, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 43 sendto(10, "O\254\1\0\0\1\0\0\0\0\0\0\0060-jw-w\7channel\10face"..., 45, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 45 sendto(10, "\330\310\1\0\0\1\0\0\0\0\0\0\2t1\7gstatic\3com\0\0\1\0\1", 32, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 32 sendto(10, "\16Z\1\0\0\1\0\0\0\0\0\0\4mail\3aol\3com\0\0\1\0\1", 30, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 30 sendto(10, "\327\216\1\0\0\1\0\0\0\0\0\0\3www\10bisara7a\3com\0\0\1"..., 34, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 34 sendto(10, "o\201\1\0\0\1\0\0\0\0\0\0\003317\7channel\10faceboo"..., 42, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 42 sendto(10, "\344?\1\0\0\1\0\0\0\0\0\0\7email03\fsecureserve"..., 42, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.249.209.102")}, 16) = 42 sendto(21, "\0\0\0\n\2\0\0|\0\0\0\4\0\0\0\0\0\1\0\30\0017\360\6\0\0\0002\0P\0\0"..., 132, 0, NULL, 0) = 132 sendto(21, "\0\0\0\n\2\0\0|\0\0\0\4\0\0\0\0\0\1\0\30\1<\360\6\0\0\0\21\0P\0\0"..., 132, 0, NULL, 0) = 132 open("/cache2/pool6/squid/13/AB", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 379
getdents(379, /* 254 entries */, 32768) = 8112
getdents(379, /* 0 entries */, 32768)   = 0
close(379)                              = 0
read(165, "!", 256)                     = 1
----------------------------------------------------------------------------------------------------------------------------------------------------
Squid is freezing at this point
Please advise

Thank you
Elie


[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux