Felipe W Damasio wrote:
Hi all, Sorry for the long email. I'm using squid on a 300Mbps ISP with about 10,000 users. I have an 8-core I7 Intel processor-machine, with 8GB of RAM and 500 of HD for the cache. (exclusive Sata HD with xfs). Using aufs as storeio. I'm caching mostly multimedia files (youtube and such). Squid usually eats around 50-70% of one core. But always around midnight (when a lot of users browse the internet), my squid becomes very slow....I mean, a page that usually takes 0.04s to load takes 23seconds to load. My best guess is that the volume of traffic is making squid slow. I'm using a 2.6.29.6 vanilla kernel with tproxy enabled for squid. And I'm using these /proc configurations: echo 0 > /proc/sys/net/ipv4/tcp_ecn echo 1 > /proc/sys/net/ipv4/tcp_low_latency echo 100000 > /proc/sys/net/core/netdev_max_backlog echo 409600 > /proc/sys/net/ipv4/tcp_max_syn_backlog echo 7 > /proc/sys/net/ipv4/tcp_fin_timeout echo 15 > /proc/sys/net/ipv4/tcp_keepalive_intvl echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes echo 65536 > /proc/sys/vm/min_free_kbytes echo "262144 1024000 4194304" > /proc/sys/net/ipv4/tcp_rmem echo "262144 1024000 4194304" > /proc/sys/net/ipv4/tcp_wmem echo "1024000" > /proc/sys/net/core/rmem_max echo "1024000" > /proc/sys/net/core/wmem_max echo "512000" > /proc/sys/net/core/rmem_default echo "512000" > /proc/sys/net/core/wmem_default echo "524288" > /proc/sys/net/ipv4/netfilter/ip_conntrack_max echo "3" > /proc/sys/net/ipv4/tcp_synack_retries The machine is in bridge-mode. I wrote a little script that prints: - The date; - The "/usr/bin/time squidclient http://www.amazon.com"; - The number of ESTABLISHED connections (through netstat -an); - The number of TIME_WAIT connections; - The total number of netstat connections; - The route cache (ip route list cache); - The number of clients currently connected in squid (through mgr:info); - The number of free memory in MB (free -m); - The % used of the squid-running core; - The average number of time to respond a request / sec (mgr:info also) - 5 minutes avg; - The average number of http requests / sec (5 minutes avg) - mgr:info as well. On any other hour, I have something like: 2010-01-25 18:48:19 ; 0.04 ; 19383 ; 9902 ; 29865 ; 96972 ; 4677 ; 131 ; 59 ; 0.24524 ; 476.871718 2010-01-25 18:53:29 ; 0.04 ; 18865 ; 8593 ; 30123 ; 179570 ; 4679 ; 148 ; 62 ; 0.22004 ; 504.424207 2010-01-25 18:58:38 ; 0.04 ; 18377 ; 9056 ; 29283 ; 99038 ; 4680 ; 174 ; 61 ; 0.22004 ; 466.659336 2010-01-25 19:03:49 ; 0.04 ; 18877 ; 9133 ; 28327 ; 181196 ; 4673 ; 171 ; 57 ; 0.24524 ; 483.558436 So, it takes around 0.04s to get http://www.amazon.com. 2010-01-24 23:46:50 ; 2.53 ; 22723 ; 9861 ; 35012 ; 64752 ; 4306 ; 166; 70 ; 0.22004 ; 566.364274 2010-01-24 23:52:04 ; 3.74 ; 21173 ; 10256 ; 33242 ; 167594 ; 4309 ; 169 ; 68 ; 0.20843 ; 537.758601 2010-01-24 23:57:20 ; 0.08 ; 18691 ; 9050 ; 29590 ; 65496 ; 4312 ; 138 ; 71 ; 0.20843 ; 525.119006 2010-01-25 00:02:29 ; 15.54 ; 18016 ; 8209 ; 29035 ; 149248 ; 4318 ; 160 ; 82 ; 0.25890 ; 491.615241 As I said, it goes from 0.04 to 15.54s(!) to get a single html file. Horrible. After 12:30, everything goes back to normal. From those variables, I can't seem to find any indication of what can be causing this appalling slowdown. The number of squid users doesn't go up that much, I just see that the avg time squid reports to answering a request goes from 0.20s to 0.25, and the number of http requests/sec actually goes down from 566 to 491...which is kind of odd to me. And the number users using squid stays in aroung 4300. I talked to Mr. Dave Dykstra, and he thought it could be I/O delay issues. So I tried: cache_dir null /tmp cache_access_log none cache_store_log none But no luck, on midnight tonight again things went wild: 2010-01-25 23:57:03 ; 0.04 ; 24112 ; 11330 ; 37240 ; 74456 ; 3516 ; 160 ; 58 ; 0.25890 ; 581.047037 2010-01-26 00:02:15 ; 10.82 ; 25638 ; 11695 ; 38537 ; 177198 ; 3533 ; 149 ; 78 ; 0.27332 ; 570.312936 2010-01-26 00:07:38 ; 42.64 ; 23818 ; 11563 ; 38097 ; 88902 ; 3556 ; 171 ; 70 ; 0.30459 ; 585.880418 From 0.04 to 42 seconds to load the main html page of amazon.com. (!) Do you have any idea or any other data I can collect to try and track down this?
Check your log rotation schedule. Is it possible that logs are being rotated at midnight? I think that the swap.state file is rewritten when "squid -k rotate" is called. Check the beginning of your cache.log to verify.
I'm using squid-2.7.stable7, but I'm willing to try squid-3.0 or squid-3.1 if you guys think it could help. I'm using 2 gigabit Marvell Ethernet boards with sky2 driver. Don't know if it's relevant, though. If you guys need any more info to try and help me figure this out, please ask. I'm willing to test, code or do pretty much anything to make squid perform better on my environment Please let me know how can I help you help me. :-) Thanks! Felipe Damasio
Chris