Re: squid 3.1.10 page allocation failure. order:1, mode:0x20

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Mon, 19 Aug 2013 19:27:45 +1200

On 17/08/2013 6:45 a.m., inittab wrote:
Hello,

I wanted to get some suggestions on my current setup and ask if i'm
expecting too much out of my hardware for the traffic load.

Sorry for the slow reply.

NOTE: If you determine that it is a memory leak, please upgrade to the 
current Squid-3.3 or later versions. There are a few dozen leaks in 3.1 
and 3.2 series of various sizes which have been fixed. Not everybody is 
hitting them due to specific behaviour causing each one, but you may be.

it appears i am running into out of memory problems and hitting swap,
squid processes then end up dying out.
[root@squid01 squid]# dmesg | grep "page allocation"
swapper: page allocation failure. order:1, mode:0x20
kswapd0: page allocation failure. order:1, mode:0x20
kswapd0: page allocation failure. order:1, mode:0x20
kswapd0: page allocation failure. order:1, mode:0x20
kswapd0: page allocation failure. order:1, mode:0x20
kswapd0: page allocation failure. order:1, mode:0x20
kswapd0: page allocation failure. order:1, mode:0x20
kswapd0: page allocation failure. order:1, mode:0x20
kswapd0: page allocation failure. order:1, mode:0x20
kswapd0: page allocation failure. order:1, mode:0x20
squid: page allocation failure. order:1, mode:0x20

I currently have 2 dell 2950's running squid 3.1.10, we generally see
~200Mbps total.

How many HTTP requests/second is the most relevant traffic speed metric 
for Squid.

FYI: 200Mbps of traffic coudl be coming from 1 single HTTPS / CONNECT 
request per day, or from a million IMS requests. The effect on and by 
Squid CPU and memory is drastically different for each of those cases 
and varies greatly for all permutations in between.

Each request requires soem KB amount of buffer memory - 1 request/day vs 
a million requests/day and you can see where the relevance starts to 
appear for your particular problem.

box stats are:
2x Six-Core AMD Opteron(tm) Processor 2427 @2.2Ghz
32gb ram
1x Intel E1G44HTBLK Server Adapter I340-T4 all 4 ports bonded with 802.3ad
/var/spool/squid 512G raid5

Ah. RAID. Well there is some more disk I/O overheads you could possibly 
avoid:
http://wiki.squid-cache.org/SquidFaq/RAID

Keep in mind tha the cache data is effectively a local _backup_ of data 
elsewhere. It is non-critical. The only benefit you gain from RAID is 
advance warning about disk failures and some time to correct them 
without Squid crashing.

The boxes are both running 10 squid processes on different ports in
transparent mode
I am using iptables rules to redirect traffic to the different squid ports ex:
   22M 1351M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3120
   20M 1216M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3121
   18M 1094M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3122
   16M  985M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3123
   15M  886M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3124
   13M  798M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3125
   12M  718M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3126
   11M  647M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3127
9631K  582M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3128
8668K  524M REDIRECT   tcp  --  *      *       10.96.0.0/15
0.0.0.0/0           statistic mode random probability 0.100000 tcp
dpt:80 redir ports 3129

sysctl.conf:
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.netfilter.nf_conntrack_max = 196608

example squid config file: squid-p3120.conf
acl adminnet src 10.3.25.0/24
acl proxyvlan src 10.5.22.0/24
acl SSL_ports port 443
acl Safe_ports port 80        # http
acl Safe_ports port 21        # ftp
acl Safe_ports port 443        # https
acl Safe_ports port 70        # gopher
acl Safe_ports port 210        # wais
acl Safe_ports port 1025-65535    # unregistered ports
acl Safe_ports port 280        # http-mgmt
acl Safe_ports port 488        # gss-http
acl Safe_ports port 591        # filemaker
acl Safe_ports port 777        # multiling http
acl CONNECT method CONNECT
http_access allow manager localhost
http_access allow manager adminnet
http_access allow manager proxyvlan
http_access deny manager

For high speed Squid-3.2 or later I am recommending that people at least 
place the manager ACL tests down ...

http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access deny to_localhost
... here. So that the faster port rejections can protect better against 
some DDoS issues. ("manager" has become a regex test.).

http_access allow localhost
http_access allow customers

NP: given the above ACLs are all allow, you could in this proxy even 
move the manager allow lines down to here. They will be far rarer than 
your normal client traffic I think.

http_access deny all
hierarchy_stoplist cgi-bin ?
You can simplify the config by removing hierarchy_stoplist.

coredump_dir /var/spool/squid/p3120
refresh_pattern ^ftp:        1440    20%    10080
refresh_pattern ^gopher:    1440    0%    1440
refresh_pattern -i (/cgi-bin/|\?) 0    0%    0
refresh_pattern .        0    20%    4320
hosts_file /etc/hosts
dns_nameservers 10.5.7.13 10.5.7.23
cache_replacement_policy heap LFUDA
cache_swap_low 90
cache_swap_high 95
maximum_object_size_in_memory 96 KB
maximum_object_size 100 MB
cache_dir aufs /var/spool/squid/p3120 204800 16 256
cache_mem 100 MB
logfile_rotate 10
memory_pools off

It does vary between installations, but memory_pools can offer reduction 
on a lot of memory allocator overheads when it is enabled.

quick_abort_min 0 KB
quick_abort_max 0 KB
log_icp_queries off
client_db off
buffered_logs on
half_closed_clients off
url_rewrite_children 20
pid_filename /var/run/squid-p3120.pid
unique_hostname squid01-p3120.eng.XXXXXX
visible_hostname squid.eng.XXXXXXX
icp_port 3100
tcp_outgoing_address 10.5.22.101
emulate_httpd_log on

Anyone have any suggestions on whether or not i'm doing something
terribly wrong her or missing some kind of performance tuning?

Your memory requirements in MB of RAM per proxy are:
  100 (cache_mem) + 15*0.1 (cache_mem index) + 15*205 (cache_dir index) 
+ 0.25 * R (active request buffers)

I note that this is already 3.1GB just for the index values. So 10 
proxies will be only leaving ~1GB of RAM for the operating system use, 
other processes, and Squids active request buffering.

I suggest dropping the cache_dir size to 100000 and measure the RAM 
usage on the box to see how much you can increase it back up.

Amos