On 18/02/2015 3:58 a.m., Anna Jonna Armannsdottir wrote: > Hi everybody! > My question may be rather theoretical, but in essence I need to know if > Squid really has a flaw regarding latency for connections where > keepalive is on. > > At ApacheCon 2014, Bryan Call presented slides where slides nr. 40 to 49 > show where he writes on slide 46 about Squid: > "Worst median latency for keep-alive benchmarks" . > The slides are here: > http://www.slideshare.net/bryan_call/choosing-a-proxy-server-apachecon-2014 > The configuration for Squid is shown on slide nr. 36. To my eyes it > looks a little over simplistic. I hope he has not configured Squid > correctly and that somebody here can point me at better configuration > that expressly does not have latency of many seconds and a 95 percentile > of over 10 seconds. Those numbers were achieved by mesurement using > CoAdvisor > ( see > http://coad.measurement-factory.com/cgi-bin/coad/FaqCgi?item_id=ALL ) > Thank you for pointing this out. Its nice to see someone other than me mentioning '00K RPS rates for Squid, even if it is just lab tests. We also usually end up with some performance improvements whenever anyone tests anything. This time could be the I/O latency :-) [cc'ing the squid-dev mailign list in case anyone there want to also respond or pick up the challenge of improving latency.] > My intent, is to use Squid with CARP or VRRP as a reverse proxy and load > balancer for a cluster of webservers. > > My main reason for using Squid rather than NGINX or ATX or Varnish is > Squid's superior protocol compliance. Byan Call's demostrated latency > gives me reasons for concern. Its not clear on a few points that are needed for replication of the results: * what software versions he is using. We had a lot of trouble with Varnish vs Squid benchmarks where the latest Varnish was being compared to a 10-year-older Squid version. In our tests a conteporary Squid proved to be within 20% of Varnish speed, but the published documents showed orders of magnitude difference. Also, event driven software like Squid has a "sweet spot" for peak performance balancing CPU between processing I/O queue or event queue. At that spot latency is quite low, go higher and the event processing increases it, go lower and modern CPUs decrease their power usage to reduce available cycles. 1K clients looks suspiciously like its just over the sweet spot for the currently most popular squid-3 versions. I like to see what a comparison looks like with +/- 200 clients. * How many cores the test machine had for the proxy to use. Its not clear if his testing was on a machine with 25+ physical cores. If not then there is worker contention for CPU time going on. Squid was historically designed to make the most of a single-core CPU, all that design is still present in each worker so its best to allocate only one worker per CPU with a spare core for the OS (virtual or hyperthreaded cores dont count). There is also threading in Squid (contrary to slide 29), but that is mostly for disk I/O so he can be forgiven for ignoring it. Its not clear how many worker processes or thread httpd or Varnish are using. Maybe their defaults which are quite high. NginX is also stuck with 24 workers. They are more lightweight than Squid ones, however... ATS is configured with 3 "lightweight threads" which should work stunningly well for anything at or above a single quad-core CPU. * whether the test is done over a network link, or the loccalhost machien is oping with both the leint and proxy and server Some oddities: * on slide 40-41 I am surprised to see that both ATS and Varnish are supplying more responses per second than the test client was reportedly sending. Note how its "100K rated limited", but they reach above 100K RPS. * If you look closely there is a 5x reduction in latency by closing TCP connections immediately after processing one request. Despite Squid processing quite a lot more code in the close case. The CPU usage numbers do match the extra processing though. This maybe something we could improve. * slide 28 mention of open()/locking - those are completely irrelevant to properly written event processing. Though its common to see *threading* processing model people write code like that. As if an event was a thread that could pause. * Not sure if its an oddity since its so common, but there is a clearly a biased review. Listing only how others compare to ATS features rather than how they all stand overall. Slide 44 claim of "Best cache implementation" seems a little rich given the lack of HTTP/1.1 features shown - fastest responding in these tests perhapse. Claim of Apache community as a bonus, but no mention of others having any communities. Probably other suble things. > > I spent the last weeks searching but I have not found anything that > seems to counter Mr. Call's claim. On behalf of the Squid developers and > users, I would be wery grateful if anybody could show or demonstrate the > contrary. Preferably with configuration. > Since Adrian left us a few years back nobody has been doing detailed performance analysis an test results on Squid AFAIK. We run CI polygraph and coadvisor tests to try and make gains but there are many small specific areas they dont cover in detail - at least in the setup we have. 1) pipeline_prefetch directive in Squid defaults to 1. Meaning that keep-alive only saves on TCP handshake latency, not on HTTP request parsing latency. AIUI the others default to a pipeline length of 5 or 10 so they can process the HTTP bytes 5-10x more parallel. In a lab benchmark this matters since TCP latency is often extremely small and variable whereas HTTP latency is from a fixed amount of message bytes. 2) I would also like to point out the implicit meaning of slides 22 & 24. The lack of HTTP feature support in the other software indicates that their processing pathways are checking for less cases and conditions in each protocol message. I have been finding that CPU consumption of Squid rises each time new HTTP features are supported. Within the CPU capabilities the performance and/or caching ability grows better, but we are spending CPU to gain that. His tests are not exactly straining the extra HTTP features (mostly relevant to MISS traffic and revalidation**) the others are lacking, so Squid is effectively wasting CPU cycles checking or setting up message state for them. ** I doubt it given the lab test. But if the "100% HIT" was actually a near-HIT needing revalidation under HTTP/1.1 conditions Squid may have extra server latency hiding in the background there. We are already going down the track of reducing that type of pre-processing work for squid-3.6+, but there is only so much that is possible. > About me: > I have been a Squid proxy admin for almost 10 years now, and also > administrating web cluster solutions for a small university. I am > already deploying VRRP with NGINX as a load-balancer, but me and my > coworkers are not satisfied with its performance. If Squid is not up to your needs there I suggest taking a look at HAproxy. Its specializing in the LB role and Willy T. is pretty "onto it" with both performance and HTTP/1.1 compliance. Amos _______________________________________________ squid-users mailing list squid-users@xxxxxxxxxxxxxxxxxxxxx http://lists.squid-cache.org/listinfo/squid-users