Hi again,
Went into work the next day and carried on only to find that one of the
SSDs was spewing CRC failures into the log. I don;t recall seeing any
of these in the previous days and the logs didn;t contain any entries
prior to my email so I assume that was a new problem.
I've pulled the SSDs and replaced then with 160GB SATA II disks. I then
ran curl-loader with 150 client load and the resulting 5 minute stats
dump is posted below.
sample_start_time = 1299342747.28318 (Sat, 05 Mar 2011 16:32:27 GMT)
sample_end_time = 1299343047.45842 (Sat, 05 Mar 2011 16:37:27 GMT)
client_http.requests = 115.309931/sec
client_http.hits = 0.003333/sec
client_http.errors = 0.000000/sec
client_http.kbytes_in = 25.411849/sec
client_http.kbytes_out = 195.575242/sec
client_http.all_median_svc_time = 0.012346 seconds
client_http.miss_median_svc_time = 0.013086 seconds
client_http.nm_median_svc_time = 0.000000 seconds
client_http.nh_median_svc_time = 0.000000 seconds
client_http.hit_median_svc_time = 0.000000 seconds
server.all.requests = 0.299982/sec
server.all.errors = 0.000000/sec
server.all.kbytes_in = 1.573241/sec
server.all.kbytes_out = 0.333314/sec
server.http.requests = 0.266651/sec
server.http.errors = 0.000000/sec
server.http.kbytes_in = 1.103269/sec
server.http.kbytes_out = 0.253319/sec
server.ftp.requests = 0.000000/sec
server.ftp.errors = 0.000000/sec
server.ftp.kbytes_in = 0.000000/sec
server.ftp.kbytes_out = 0.000000/sec
server.other.requests = 0.033331/sec
server.other.errors = 0.000000/sec
server.other.kbytes_in = 0.469973/sec
server.other.kbytes_out = 0.076662/sec
icp.pkts_sent = 0.000000/sec
icp.pkts_recv = 0.000000/sec
icp.queries_sent = 0.000000/sec
icp.replies_sent = 0.000000/sec
icp.queries_recv = 0.000000/sec
icp.replies_recv = 0.000000/sec
icp.replies_queued = 0.000000/sec
icp.query_timeouts = 0.000000/sec
icp.kbytes_sent = 0.000000/sec
icp.kbytes_recv = 0.000000/sec
icp.q_kbytes_sent = 0.000000/sec
icp.r_kbytes_sent = 0.000000/sec
icp.q_kbytes_recv = 0.000000/sec
icp.r_kbytes_recv = 0.000000/sec
icp.query_median_svc_time = 0.000000 seconds
icp.reply_median_svc_time = 0.000000 seconds
dns.median_svc_time = 4.177065 seconds
unlink.requests = 0.000000/sec
page_faults = 0.000000/sec
select_loops = 202.268185/sec
select_fds = 498.770865/sec
average_select_fd_period = 0.002005/fd
median_select_fds = 0.000000
swap.outs = 0.063330/sec
swap.ins = 0.000000/sec
swap.files_cleaned = 0.000000/sec
aborted_requests = 0.049997/sec
syscalls.polls = 202.268185/sec
syscalls.disk.opens = 0.063330/sec
syscalls.disk.closes = 0.063330/sec
syscalls.disk.reads = 0.000000/sec
syscalls.disk.writes = 0.406643/sec
syscalls.disk.seeks = 0.000000/sec
syscalls.disk.unlinks = 0.000000/sec
syscalls.sock.accepts = 220.630446/sec
syscalls.sock.sockets = 99.554185/sec
syscalls.sock.connects = 0.216654/sec
syscalls.sock.binds = 99.554185/sec
syscalls.sock.closes = 209.474431/sec
syscalls.sock.reads = 147.911360/sec
syscalls.sock.writes = 242.719155/sec
syscalls.sock.recvfroms = 2.516520/sec
syscalls.sock.sendtos = 0.079995/sec
cpu_time = 8.350000 seconds
wall_time = 300.017524 seconds
cpu_usage = 2.783171%
Whether the SSD replacement will give a real world cure will be seen Monday.
Thanks again,
Julian
On 03/03/11 17:59, Pieter De Wit wrote:
Hi Julian,
The one stat that I can't see here is disk access. I know you said
that you have SSD's, but what is the disk stats for your logging
volume and the squid volume ? If you totally bypass the proxy, does it
improve ? (could be that the squid server is getting shaped ?)
Cheers,
Pieter
On 4/03/2011 06:46, Julian Pilfold-Bagwell wrote:
Hi All,
I've been having some problems with Squid and Dansguardian for a
while now and despite lots of time on Google, haven't found a solution.
The problem started a week or so back when I noticed that squid was
slowing. A quick look through the logs showed it was running out of
file descriptors so I upped the level to take account. The server
was ancient so I bought in an HP Proliant DL120 (dual Pentium 2.80Ghz
G6950 CPU & 4GB of RAM). At the same time, I bought in 2 x 60GB SSD
drives to use as cache space with the system on a RAID 1 array with
160GB SATA II disks.
On this, I installed Ubuntu server 10.04.2 LTS with Squid 2.7 (from
apt) and Dansguardian 2.10.1.1. The kernel version is
2.6.32-24-server and the server authenticates via a Samba PDC (v
3.5.6) using OpenLDAP/Winbind. The Samba version on the proxy
machine is v 3.4.7 as supplied from the Ubuntu repo.
This however also seems to run out of steam. My first thought was
that it may have been running out of RAM so I ran htop. Both CPUs
were topping out at 20% and out of the 4GB of RAM, 1.3GB was used.
Next I checked the load on the NIC and found that it was running on
average 400kB/s, with the odd burst at 5MB/s. As the load increased,
web pages were taking up to 30-45 seconds to load. I bypassed
Dansguardian and went in on 3128 with no change in performance.
Following the recommendations on other sites discovered via Google, I
tuned and tweaked settings with no real benefit and I can't see that
I changed anything to cause it to happen. The log files look fine, I
have 10000 file descriptors available and cachemgr shows plenty of
spares. There are 50% more NTLM authenticators than are in use at any
given time.
The config file for Squid is shown below. I have had the number of
authenticators set to 400 as I have 350 users but the number in use
still peaked at around 50. If I've been a numpty and done something
glaringly obvious, I'd be grateful if someone could point it out. If
not, ask for info and I'll provide it.
Thanks,
Jools
## Squid.conf
## Start with authentication for clients
auth_param ntlm program /usr/bin/ntlm_auth
--helper-protocol=squid-2.5-ntlmssp
auth_param ntlm_param children 100
auth_param ntlm keep_alive on
auth_param basic program /usr/bin/ntlm_auth
--helper-protocol=squid-2.5-basic
auth_param basic children 100
auth_param basic realm Squid proxy-caching web server
auth_param basic credentialsttl 2 hours
## Access Control Lists for filter bypass ##
acl realtek dstdomain .realtek.com.tw
acl tes dstdomain .tes.co.uk
acl glogster dstdomain .glogster.com
acl adobe-installer dstdomain .adobe.com # allow installs from adobe
download manager
acl actihealth dstdomain .actihealth.com .actihealth.net # Allow
direct access for PE dept activity monitors
acl spybotupdates dstdomain .safer-networking.org .spybotupdates.com
# Allow updates for Spybot S&D
acl sims-update dstdomain .kcn.org.uk .capitaes.co.uk
.capitasolus.co.uk .sims.co.uk # Allow SIMS to update itself directly
acl kcc dstdomain .kenttrustweb.org.uk # Fix problem with county
acl frenchconference dstdomain flashmeeting.e2bn.net
acl emsonline dstdomain .emsonline.kent.gov.uk
acl clamav dstdomain .db.gb.clamav.net
acl ubuntu dstdomain .ubuntu.com .warwick.ac.uk
acl windowsupdate dstdomain windowsupdate.microsoft.com
acl windowsupdate dstdomain .update.microsoft.com
acl windowsupdate dstdomain download.windowsupdate.com
acl windowsupdate dstdomain redir.metaservices.microsoft.com
acl windowsupdate dstdomain images.metaservices.microsoft.com
acl windowsupdate dstdomain c.microsoft.com
acl windowsupdate dstdomain www.download.windowsupdate.com
acl windowsupdate dstdomain wustat.windows.com
acl windowsupdate dstdomain crl.microsoft.com
acl windowsupdate dstdomain sls.microsoft.com
acl windowsupdate dstdomain productactivation.one.microsoft.com
acl windowsupdate dstdomain ntservicepack.microsoft.com
acl windowsupdate dstdomain download.adobe.com
acl comodo dstdomain download.comodo.com
acl simsb2b dstdomain emsonline.kent.gov.uk
acl powerman dstdomain pmstats.org
acl ability dstdomain ability.com
acl fulston dstdomain fulstonmanor.kent.sch.uk
acl httpsproxy dstdomain .retiredsanta.com .atunnel.com .btunnel.com
.ctunnel.com .dtunnel.com .ztunnel.com .partyaccount.com
## Access Control for filtered users ##
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl SSL_ports port 443
acl ntlm_users proxy_auth REQUIRED
acl SSL_ports port 443 # https
acl SSL_ports port 563 # snews
acl SSL_ports port 873 # rsync
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl Safe_ports port 631 # cups
acl Safe_ports port 873 # rsync
acl Safe_ports port 901 # SWAT
acl purge method PURGE
acl CONNECT method CONNECT
## Allow/Deny Lists ##
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow emsonline
http_access allow clamav
http_access allow realtek
http_access allow ubuntu
http_access allow tes
http_access allow glogster
http_access allow kcc
http_access allow fulston
http_access allow comodo
http_access allow ability
http_access allow powerman
http_access allow windowsupdate
http_access allow simsb2b
http_access allow adobe-installer
http_access allow actihealth
http_access allow spybotupdates
http_access allow sims-update
http_access allow frenchconference
http_access allow ntlm_users
http_access deny httpsproxy
http_access allow localhost
http_access deny all
icp_access deny all
## Cache Settings ##
log_fqdn off
half_closed_clients off
maximum_object_size 1024 KB
cache_access_log none
cache_store_log none
http_port 3128
redirect_children 750
hierarchy_stoplist cgi-bin ?
cache_mem 128 MB
memory_replacement_policy lru
cache_replacement_policy lru
cache_dir ufs /fastcache1 15000 16 256
cache_dir ufs /fastcache2 15000 16 256
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern (Release|Package(.gz)*)$ 0 20% 2880
refresh_pattern . 0 20% 4320
acl shoutcast rep_header X-HTTP09-First-Line ^ICY.[0-9]
upgrade_http0.9 deny shoutcast
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
extension_methods REPORT MERGE MKACTIVITY CHECKOUT
cache_effective_user proxy
## Hash out effective group as it stops access to winbind privileged
pipe and breaks authentication - jpb
# cache_effective_group proxy
max_filedescriptors 10000
dns_nameservers 172.20.0.253 172.31.49.46 172.31.81.46
hosts_file /etc/hosts
coredump_dir /var/spool/squid
--
Julian Pilfold-Bagwell,
Network Manager,
Borden Grammar School,
Avenue of Remembrance,
Sittingbourne,
Kent,
ME10 4DB.
Tel: 01795 424192