Re: Squid latency at ApacheCon 2014 in comparison between Squid, NGINX, Apache Traffic Server, Varnish and Apache

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Wed, 18 Feb 2015 13:44:06 +1300

On 18/02/2015 3:58 a.m., Anna Jonna Armannsdottir wrote:
> Hi everybody! 
> My question may be rather theoretical, but in essence I need to know if
> Squid really has a flaw regarding latency for connections where
> keepalive is on. 
> 
> At ApacheCon 2014, Bryan Call presented slides where slides nr. 40 to 49
> show where he writes on slide 46 about Squid: 
> "Worst median latency for keep-alive benchmarks" . 
> The slides are here:
> http://www.slideshare.net/bryan_call/choosing-a-proxy-server-apachecon-2014 
> The configuration for Squid is shown on slide nr. 36. To my eyes it
> looks a little over simplistic. I hope he has not configured Squid
> correctly and that somebody here can point me at better configuration
> that expressly does not have latency of many seconds and a 95 percentile
> of over 10 seconds. Those numbers were achieved by mesurement using
> CoAdvisor 
> ( see
> http://coad.measurement-factory.com/cgi-bin/coad/FaqCgi?item_id=ALL )
> 

Thank you for pointing this out. Its nice to see someone other than me
mentioning '00K RPS rates for Squid, even if it is just lab tests.

We also usually end up with some performance improvements whenever
anyone tests anything. This time could be the I/O latency :-)
[cc'ing the squid-dev mailign list in case anyone there want to also
respond or pick up the challenge of improving latency.]

> My intent, is to use Squid with CARP or VRRP as a reverse proxy and load
> balancer for a cluster of webservers. 
> 
> My main reason for using Squid rather than NGINX or ATX or Varnish is
> Squid's superior protocol compliance. Byan Call's demostrated latency 
> gives me reasons for concern. 

Its not clear on a few points that are needed for replication of the
results:

* what software versions he is using.

We had a lot of trouble with Varnish vs Squid benchmarks where the
latest Varnish was being compared to a 10-year-older Squid version. In
our tests a conteporary Squid proved to be within 20% of Varnish speed,
but the published documents showed orders of magnitude difference.

Also, event driven software like Squid has a "sweet spot" for peak
performance balancing CPU between processing I/O queue or event queue.
At that spot latency is quite low, go higher and the event processing
increases it, go lower and modern CPUs decrease their power usage to
reduce available cycles. 1K clients looks suspiciously like its just
over the sweet spot for the currently most popular squid-3 versions. I
like to see what a comparison looks like with +/- 200 clients.

* How many cores the test machine had for the proxy to use.

Its not clear if his testing was on a machine with 25+ physical cores.
If not then there is worker contention for CPU time going on.

Squid was historically designed to make the most of a single-core CPU,
all that design is still present in each worker so its best to allocate
only one worker per CPU with a spare core for the OS (virtual or
hyperthreaded cores dont count). There is also threading in Squid
(contrary to slide 29), but that is mostly for disk I/O so he can be
forgiven for ignoring it.

Its not clear how many worker processes or thread httpd or Varnish are
using. Maybe their defaults which are quite high.

NginX is also stuck with 24 workers. They are more lightweight than
Squid ones, however...

ATS is configured with 3 "lightweight threads" which should work
stunningly well for anything at or above a single quad-core CPU.

* whether the test is done over a network link, or the loccalhost
machien is oping with both the leint and proxy and server

Some oddities:

* on slide 40-41 I am surprised to see that both ATS and Varnish are
supplying more responses per second than the test client was reportedly
sending. Note how its "100K rated limited", but they reach above 100K RPS.

* If you look closely there is a 5x reduction in latency by closing TCP
connections immediately after processing one request. Despite Squid
processing quite a lot more code in the close case. The CPU usage
numbers do match the extra processing though. This maybe something we
could improve.

* slide 28 mention of open()/locking - those are completely irrelevant
to properly written event processing. Though its common to see
*threading* processing model people write code like that. As if an event
was a thread that could pause.

* Not sure if its an oddity since its so common, but there is a clearly
a biased review.

Listing only how others compare to ATS features rather than how they all
stand overall. Slide 44 claim of "Best cache implementation" seems a
little rich given the lack of HTTP/1.1 features shown - fastest
responding in these tests perhapse. Claim of Apache community as a
bonus, but no mention of others having any communities. Probably other
suble things.

> 
> I spent the last weeks searching but I have not found anything that
> seems to counter Mr. Call's claim. On behalf of the Squid developers and
> users, I would be wery grateful if anybody could show or demonstrate the
> contrary. Preferably with configuration. 
> 

Since Adrian left us a few years back nobody has been doing detailed
performance analysis an test results on Squid AFAIK. We run CI polygraph
and coadvisor tests to try and make gains but there are many small
specific areas they dont cover in detail - at least in the setup we have.

1) pipeline_prefetch directive in Squid defaults to 1.

Meaning that keep-alive only saves on TCP handshake latency, not on HTTP
request parsing latency. AIUI the others default to a pipeline length of
5 or 10 so they can process the HTTP bytes 5-10x more parallel.

In a lab benchmark this matters since TCP latency is often extremely
small and variable whereas HTTP latency is from a fixed amount of
message bytes.

2) I would also like to point out the implicit meaning of slides 22 & 24.

The lack of HTTP feature support in the other software indicates that
their processing pathways are checking for less cases and conditions in
each protocol message. I have been finding that CPU consumption of Squid
rises each time new HTTP features are supported. Within the CPU
capabilities the performance and/or caching ability grows better, but we
are spending CPU to gain that. His tests are not exactly straining the
extra HTTP features (mostly relevant to MISS traffic and revalidation**)
the others are lacking, so Squid is effectively wasting CPU cycles
checking or setting up message state for them.
  ** I doubt it given the lab test. But if the "100% HIT" was actually a
near-HIT needing revalidation under HTTP/1.1 conditions Squid may have
extra server latency hiding in the background there.

We are already going down the track of reducing that type of
pre-processing work for squid-3.6+, but there is only so much that is
possible.

> About me: 
> I have been a Squid proxy admin for almost 10 years now, and also
> administrating web cluster solutions for a small university. I am
> already deploying VRRP with NGINX as a load-balancer, but me and my
> coworkers are not satisfied with its performance. 

If Squid is not up to your needs there I suggest taking a look at
HAproxy. Its specializing in the LB role and Willy T. is pretty "onto
it" with both performance and HTTP/1.1 compliance.

Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users