Hello, Thanks for the suggestions. I've moved up to squid version 3.3.5, changed the raid5 into a raid0, tweaked the value of cache_dir to 100000, moved the acl manager lines, removed hierarchy_stoplist, and enabled memory_pools. I have also added RPS monitoring to our cacti instance so i can get a better idea when I give this a shot again. When running multiple processes of squid to deal with multiple (slow) cores, how many processes do you recommend running? 1 per core or less? I currently have 10 processes setup on 12 cores but do not know if this is the correct way to go about it. Thanks, Richard On Mon, Aug 19, 2013 at 3:27 AM, Amos Jeffries <squid3@xxxxxxxxxxxxx> wrote: > On 17/08/2013 6:45 a.m., inittab wrote: >> >> Hello, >> >> I wanted to get some suggestions on my current setup and ask if i'm >> expecting too much out of my hardware for the traffic load. > > > Sorry for the slow reply. > > NOTE: If you determine that it is a memory leak, please upgrade to the > current Squid-3.3 or later versions. There are a few dozen leaks in 3.1 and > 3.2 series of various sizes which have been fixed. Not everybody is hitting > them due to specific behaviour causing each one, but you may be. > > > >> it appears i am running into out of memory problems and hitting swap, >> squid processes then end up dying out. >> [root@squid01 squid]# dmesg | grep "page allocation" >> swapper: page allocation failure. order:1, mode:0x20 >> kswapd0: page allocation failure. order:1, mode:0x20 >> kswapd0: page allocation failure. order:1, mode:0x20 >> kswapd0: page allocation failure. order:1, mode:0x20 >> kswapd0: page allocation failure. order:1, mode:0x20 >> kswapd0: page allocation failure. order:1, mode:0x20 >> kswapd0: page allocation failure. order:1, mode:0x20 >> kswapd0: page allocation failure. order:1, mode:0x20 >> kswapd0: page allocation failure. order:1, mode:0x20 >> kswapd0: page allocation failure. order:1, mode:0x20 >> squid: page allocation failure. order:1, mode:0x20 >> >> >> >> I currently have 2 dell 2950's running squid 3.1.10, we generally see >> ~200Mbps total. > > > How many HTTP requests/second is the most relevant traffic speed metric for > Squid. > > FYI: 200Mbps of traffic coudl be coming from 1 single HTTPS / CONNECT > request per day, or from a million IMS requests. The effect on and by Squid > CPU and memory is drastically different for each of those cases and varies > greatly for all permutations in between. > > Each request requires soem KB amount of buffer memory - 1 request/day vs a > million requests/day and you can see where the relevance starts to appear > for your particular problem. > > >> box stats are: >> 2x Six-Core AMD Opteron(tm) Processor 2427 @2.2Ghz >> 32gb ram >> 1x Intel E1G44HTBLK Server Adapter I340-T4 all 4 ports bonded with 802.3ad >> /var/spool/squid 512G raid5 > > > Ah. RAID. Well there is some more disk I/O overheads you could possibly > avoid: > http://wiki.squid-cache.org/SquidFaq/RAID > > Keep in mind tha the cache data is effectively a local _backup_ of data > elsewhere. It is non-critical. The only benefit you gain from RAID is > advance warning about disk failures and some time to correct them without > Squid crashing. > > >> The boxes are both running 10 squid processes on different ports in >> transparent mode >> I am using iptables rules to redirect traffic to the different squid ports >> ex: >> 22M 1351M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3120 >> 20M 1216M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3121 >> 18M 1094M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3122 >> 16M 985M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3123 >> 15M 886M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3124 >> 13M 798M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3125 >> 12M 718M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3126 >> 11M 647M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3127 >> 9631K 582M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3128 >> 8668K 524M REDIRECT tcp -- * * 10.96.0.0/15 >> 0.0.0.0/0 statistic mode random probability 0.100000 tcp >> dpt:80 redir ports 3129 >> >> sysctl.conf: >> net.ipv4.ip_forward = 0 >> net.ipv4.conf.default.rp_filter = 1 >> net.ipv4.conf.default.accept_source_route = 0 >> kernel.sysrq = 0 >> kernel.core_uses_pid = 1 >> net.ipv4.tcp_syncookies = 1 >> net.bridge.bridge-nf-call-ip6tables = 0 >> net.bridge.bridge-nf-call-iptables = 0 >> net.bridge.bridge-nf-call-arptables = 0 >> kernel.msgmnb = 65536 >> kernel.msgmax = 65536 >> kernel.shmmax = 68719476736 >> kernel.shmall = 4294967296 >> net.netfilter.nf_conntrack_max = 196608 >> >> >> example squid config file: squid-p3120.conf >> acl adminnet src 10.3.25.0/24 >> acl proxyvlan src 10.5.22.0/24 >> acl SSL_ports port 443 >> acl Safe_ports port 80 # http >> acl Safe_ports port 21 # ftp >> acl Safe_ports port 443 # https >> acl Safe_ports port 70 # gopher >> acl Safe_ports port 210 # wais >> acl Safe_ports port 1025-65535 # unregistered ports >> acl Safe_ports port 280 # http-mgmt >> acl Safe_ports port 488 # gss-http >> acl Safe_ports port 591 # filemaker >> acl Safe_ports port 777 # multiling http >> acl CONNECT method CONNECT >> http_access allow manager localhost >> http_access allow manager adminnet >> http_access allow manager proxyvlan >> http_access deny manager > > > For high speed Squid-3.2 or later I am recommending that people at least > place the manager ACL tests down ... > > >> http_access deny !Safe_ports >> http_access deny CONNECT !SSL_ports >> http_access deny to_localhost > > ... here. So that the faster port rejections can protect better against some > DDoS issues. ("manager" has become a regex test.). > > >> http_access allow localhost >> http_access allow customers > > > NP: given the above ACLs are all allow, you could in this proxy even move > the manager allow lines down to here. They will be far rarer than your > normal client traffic I think. > > >> http_access deny all >> hierarchy_stoplist cgi-bin ? > > You can simplify the config by removing hierarchy_stoplist. > > >> coredump_dir /var/spool/squid/p3120 >> refresh_pattern ^ftp: 1440 20% 10080 >> refresh_pattern ^gopher: 1440 0% 1440 >> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 >> refresh_pattern . 0 20% 4320 >> hosts_file /etc/hosts >> dns_nameservers 10.5.7.13 10.5.7.23 >> cache_replacement_policy heap LFUDA >> cache_swap_low 90 >> cache_swap_high 95 >> maximum_object_size_in_memory 96 KB >> maximum_object_size 100 MB >> cache_dir aufs /var/spool/squid/p3120 204800 16 256 >> cache_mem 100 MB >> logfile_rotate 10 >> memory_pools off > > > It does vary between installations, but memory_pools can offer reduction on > a lot of memory allocator overheads when it is enabled. > > >> quick_abort_min 0 KB >> quick_abort_max 0 KB >> log_icp_queries off >> client_db off >> buffered_logs on >> half_closed_clients off >> url_rewrite_children 20 >> pid_filename /var/run/squid-p3120.pid >> unique_hostname squid01-p3120.eng.XXXXXX >> visible_hostname squid.eng.XXXXXXX >> icp_port 3100 >> tcp_outgoing_address 10.5.22.101 >> emulate_httpd_log on >> >> >> >> Anyone have any suggestions on whether or not i'm doing something >> terribly wrong her or missing some kind of performance tuning? > > > Your memory requirements in MB of RAM per proxy are: > 100 (cache_mem) + 15*0.1 (cache_mem index) + 15*205 (cache_dir index) + > 0.25 * R (active request buffers) > > I note that this is already 3.1GB just for the index values. So 10 proxies > will be only leaving ~1GB of RAM for the operating system use, other > processes, and Squids active request buffering. > > I suggest dropping the cache_dir size to 100000 and measure the RAM usage on > the box to see how much you can increase it back up. > > Amos