The big issues you have are:
* using NTLM. This seriously caps the proxy performance and capacity.
Each new TCP connection (~30 per second from your graphs) requires at
least two full HTTP reqesut/reply round trips just to authenticate
before the actual HTTP response can begin to be identified and fetched.
* using group to base access permissions. Like NTLM this caps the
capacity of your Squid.
* using a URL helper. Whether that is a big drag or not depends on what
you are using it for and whether Squid can do that faster by itself.
These are your big performance bottlenecks. Eliminating any of them will
speed up your proxy. BUT whether it is worth doing is up to you.
All of the evidence is (to my eye anyway) looking like NTLM being the
cause of a temporary bandwidth flood around 13:30-13:45. Whether that is
matching your report of "slow" is unknown. You should drop NTLM anyway
if you can. It has officially been deprecated by MS and Kerberos is far
more efficient and faster.
From your graphs I note your peak traffic time of 13:15-13:45 shows a
bandwidth peak of almost 10Mbps. I guess that these are the "slow" times
your users are complaining about? - it is expected that things slow down
when the bandwidth completely fills up although whether you are working
off 10Mbps NIC is unknown. TCP graphs are showing an increase in the
early part of the peak, and HTTP response rate peaks out in the second
half. This is consistent with NTLM sucking up extra bandwidth
authenticating new connections - first half of the peak is initial TCP
setup + HTTP first requests, HTTP peaks in a burst of challenge
responses followed by both further HTTP as the clients send the
handshake re-request and the actual HTTP response part of the cycle
happens (client requests peak in both halves, out bandwidth peaks only
in the second half with the larger responses involved).
The HTTP response time quadruples (20ms -> 80ms) in the 15 minutes
*before* these peaks occur and HIT ratio jumps by ~15% over the peak
traffic time. Consistent with a number of requests queueing at the
authentication and group lookup stages.
I guess you have 10Mbps NIC, which could be part of the issue. Squid
should be able to handle 50-100 req/sec despite NTLM and yet it is
maxing out at 30. But 9.7Mbps is a suspicious number for peak bandwidth.
If your NIC are faster the above can all happen just the same on faster
NIC due to processing time / response time for the helpers. But on
faster NIC I would expect to see higher bandwidth, TCP connection rates,
and longer HTTP response times on the held up connection attempts.
Alternatively, after 16:30 and before 07:30 the TCP speeds are ramping
up/down between the daily normal and overnight low traffic throughput.
Squid is designed on a traffic-driven event model. We have some issues
that when there is low enough traffic per-millisecond there are several
components in Squid which start taking ~10ms pauses between handling
events (to preserve against 100% CPU cycling checking for nothing) and
can cause the response times to increase somewhat. If your reports are
comign in from the earlybirds or late workers this is probably the reason.
On 16/11/2012 11:50 p.m., Fuhrmann, Marcel wrote:
I have some performance graphs. Maybe they will help:
http://ubuntuone.com/09XVmTzqmNAPgVDmc6h2yI
I see two other weird things.
* FQDN cache is not storing DNS responses for some reason - that will
cause a little bit of slowdown.
* the packets/sec graph at your peak traffic (10Mbps) is only showing
~500 packets. Do you have jumbo packets enabled on your network? If so
it looks like you are getting bandwidth in packets of ~200KB which will
cause some requests to be held up slightly behind other large packets.
This is an effect which gets worse as the bandwidth pipes approach full.
There is no matching congestion control ICMP traffic peak showing up -
so I'm not sure of the accuracy there.
Amos