In order to get squid server 400M+ traffic, I did these: 1. Memory only IO bottleneck is too hard to avoid at high traffic, so I did not use harddisk, use only memory for HTTP cache. 32GB or 64GB memory per box works good. 2. Disable useless acl I did not use any acl, even default acls: acl SSL_ports port 443 acl Safe_ports port 80 # http acl Safe_ports port 21 # ftp acl Safe_ports port 443 # https acl Safe_ports port 70 # gopher acl Safe_ports port 210 # wais acl Safe_ports port 1025-65535 # unregistered ports acl Safe_ports port 280 # http-mgmt acl Safe_ports port 488 # gss-http acl Safe_ports port 591 # filemaker acl Safe_ports port 777 # multiling http acl Safe_ports port 901 # SWAT http_access deny !Safe_ports http_access deny CONNECT !SSL_ports squid itself do not do any acls, security is ensured by other layers, like iptables or acls on routers. 3. refresh_pattern, mainly cache for pictures Make squid cache as long as it can, so it looks likes this: refresh_pattern -i \.(jpg|jpeg|gif|png|swf|htm|html|bmp)(\?.*)?$ 21600 100% 21600 reload-into-ims ignore-reload ignore-no-cache ignore-auth ignore-private 4. multi-instance I can't get single squid process runs over 200M, so multi-instance make perfect sense. Both CARP frontend and backend (for store HTTP files) need to be multi-instanced. Frontend configuration is here: http://wiki.squid-cache.org/ConfigExamples/ExtremeCarpFrontend I heard that squid is still can't process "huge" memory properly, so I splited big memory into 6-8GB per instance, which listens at ports lower than 80. And on a box with 32GB memory CARP frontend configs like this: cache_peer 192.168.1.73 parent 76 0 carp name=73-76 proxy-only cache_peer 192.168.1.73 parent 77 0 carp name=73-77 proxy-only cache_peer 192.168.1.73 parent 78 0 carp name=73-78 proxy-only cache_peer 192.168.1.73 parent 79 0 carp name=73-79 proxy-only 5. CARP frontend - cache_mem 0 MB I used to use "cache_mem 0 MB", time flies, I think that files smaller than 1.5KB would be waste if GET from CARP backend, am I right? I use these now: cache_mem 5 MB maximum_object_size_in_memory 1.5 KB 6. LAN, WAN seperates Again, to split load on NIC. Use LAN for clients and CARP interaction, WAN to fetch content from internet. 7. Using official NIC driver. Sometimes chip vender's official driver acts better behavior than builtin driver, so it's worth to try. 8. Based on gentoo Using gentoo, we can strip useless function as much as possible, make the cache system thinner, and faster. 9. Strip useless compile options and runtime options Proper CFLAGS and LDFLAGS are needed, here's one good doc: http://en.gentoo-wiki.com/wiki/Safe_Cflags ~ # squid -v Squid Cache: Version 2.7.STABLE9 configure options: '--prefix=/usr' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--libdir=/usr/lib64' '--sysconfdir=/etc/squid' '--libexecdir=/usr/libexec/squid' '--localstatedir=/var' '--datadir=/usr/share/squid' '--disable-auth' '--disable-delay-pools' '--enable-removal-policies=lru,heap' '--enable-ident-lookups' '--enable-useragent-log' '--enable-cache-digests' '--enable-referer-log' '--enable-http-violations' '--with-pthreads' '--with-large-files' '--enable-wccpv2' '--enable-htcp' '--enable-carp' '--enable-icmp' '--enable-follow-x-forwarded-for' '--enable-x-accelerator-vary' '--enable-kill-parent-hack' '--enable-cachemgr-hostname=squid37' '--enable-err-languages=English' '--enable-default-err-language=English' '--with-maxfd=65535' '--without-libcap' '--disable-snmp' '--disable-ssl' '--enable-storeio=ufs,diskd,coss,aufs,null' '--enable-async-io' '--enable-linux-netfilter' '--disable-linux-tproxy' '--enable-epoll' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu' 'CC=x86_64-pc-linux-gnu-gcc' 'CFLAGS=-march=barcelona -mtune=barcelona -O2 -pipe' 'LDFLAGS=-Wl,-O1 -Wl,--as-needed' 10. sysctl tune net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 kernel.sysrq = 0 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_syn_retries = 3 net.ipv4.tcp_synack_retries = 3 net.ipv4.tcp_max_syn_backlog = 4096 net.core.netdev_max_backlog = 4096 net.ipv4.ip_local_port_range = 1024 65534 net.netfilter.nf_conntrack_max = 1048576 net.netfilter.nf_conntrack_tcp_timeout_established = 1000 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_sack = 0 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_fin_timeout = 15 net.ipv4.tcp_keepalive_intvl = 30 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_time = 1800 net.ipv4.tcp_max_orphans = 16384 net.ipv4.tcp_orphan_retries = 1 net.ipv4.ipfrag_high_thresh = 524288 net.ipv4.ipfrag_low_thresh = 262144 kernel.pid_max = 65535 vm.swappiness = 1 net.ipv4.tcp_mem = 6085248 8113664 12170496 net.ipv4.tcp_wmem = 4096 65536 8388608 net.ipv4.tcp_rmem = 4096 87380 8388608 net.core.rmem_default = 8388608 net.core.rmem_max = 8388608 net.core.wmem_default = 8388608 net.core.wmem_max = 8388608 net.core.somaxconn = 512 net.ipv4.udp_mem = 6194688 8259584 12389376 net.ipv4.udp_rmem_min = 8192 net.ipv4.udp_wmem_min = 8192 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_tw_recycle = 1 This is all I did to get high performance, what should I do to get even better performance, any more advice?