AW: AW: any chance to optimize squid3?

"Fuhrmann, Marcel" <Marcel.Fuhrmann@xxxxxx> · Mon, 19 Nov 2012 18:14:13 +0000

Hi Amos,

thank you for your assessment. So I will try to fix these big issues first. 
I can remove the squidguard because my firewall can do this, too.
I will try to use Kerberos to authenticate to ADS.

--
 Marcel

-----Ursprüngliche Nachricht-----
Von: Amos Jeffries [mailto:squid3@xxxxxxxxxxxxx] 
Gesendet: Freitag, 16. November 2012 23:58
An: squid-users@xxxxxxxxxxxxxxx
Betreff: Re:  AW: any chance to optimize squid3?

The big issues you have are:
  * using NTLM. This seriously caps the proxy performance and capacity. 
Each new TCP connection (~30 per second from your graphs) requires at least two full HTTP reqesut/reply round trips just to authenticate before the actual HTTP response can begin to be identified and fetched.

* using group to base access permissions. Like NTLM this caps the capacity of your Squid.

* using a URL helper. Whether that is a big drag or not depends on what you are using it for and whether Squid can do that faster by itself.

These are your big performance bottlenecks. Eliminating any of them will speed up your proxy. BUT whether it is worth doing is up to you.

All of the evidence is (to my eye anyway) looking like NTLM being the cause of a temporary bandwidth flood around 13:30-13:45. Whether that is matching your report of "slow" is unknown. You should drop NTLM anyway if you can. It has officially been deprecated by MS and Kerberos is far more efficient and faster.

 From your graphs I note your peak traffic time of 13:15-13:45 shows a bandwidth peak of almost 10Mbps. I guess that these are the "slow" times your users are complaining about? - it is expected that things slow down when the bandwidth completely fills up although whether you are working off 10Mbps NIC is unknown. TCP graphs are showing an increase in the early part of the peak, and HTTP response rate peaks out in the second half. This is consistent with NTLM sucking up extra bandwidth authenticating new connections - first half of the peak is initial TCP setup + HTTP first requests, HTTP peaks in a burst of challenge responses followed by both further HTTP as the clients send the handshake re-request and the actual HTTP response part of the cycle happens (client requests peak in both halves, out bandwidth peaks only in the second half with the larger responses involved).
  The HTTP response time quadruples (20ms -> 80ms) in the 15 minutes
*before* these peaks occur and HIT ratio jumps by ~15% over the peak traffic time. Consistent with a number of requests queueing at the authentication and group lookup stages.

I guess you have 10Mbps NIC, which could be part of the issue. Squid should be able to handle 50-100 req/sec despite NTLM and yet it is maxing out at 30. But 9.7Mbps is a suspicious number for peak bandwidth. 
If your NIC are faster the above can all happen just the same on faster NIC due to processing time / response time for the helpers.  But on faster NIC I would expect to see higher bandwidth, TCP connection rates, and longer HTTP response times on the held up connection attempts.

Alternatively, after 16:30 and before 07:30 the TCP speeds are ramping up/down between the daily normal and overnight low traffic throughput. 
Squid is designed on a traffic-driven event model. We have some issues that when there is low enough traffic per-millisecond there are several components in Squid which start taking ~10ms pauses between handling events (to preserve against 100% CPU cycling checking for nothing) and can cause the response times to increase somewhat. If your reports are comign in from the earlybirds or late workers this is probably the reason.

On 16/11/2012 11:50 p.m., Fuhrmann, Marcel wrote:
> I have some performance graphs. Maybe they will help:
> http://ubuntuone.com/09XVmTzqmNAPgVDmc6h2yI
>

I see two other weird things.

  * FQDN cache is not storing DNS responses for some reason - that will cause a little bit of slowdown.

* the packets/sec graph at your peak traffic (10Mbps) is only showing
~500 packets. Do you have jumbo packets enabled on your network? If so it looks like you are getting bandwidth in packets of ~200KB which will cause some requests to be held up slightly behind other large packets. 
This is an effect which gets worse as the bandwidth pipes approach full. 
There is no matching congestion control ICMP traffic peak showing up - so I'm not sure of the accuracy there.

Amos