Hello All, I have been trying to benchmark the performance of squid for sometime now for plain HTTP and HTTPS traffic. The key performance indicators that i am looking at are Requests Per Second(RPS), Throughput(mbps) and Latency (ms). My test methodology looks like this generator(apache benchmark)<------->squid<------>server(lighttpd) All 3 are running on seperate VM on AWS. The specs for all the machines are 8 VCPU @ 2.13 GHZ 16 GB RAM Squid using 8 SMP workers to utilize all cores In all these tests I have made sure that the generator and server are always more powerful than squid. For latency calculation, Time per request is calculated with and without squid inline and the difference between them is taken. I am using a release 3.HEAD just prior to the release of 3.3. I want to share the results with the community on the squid wikis. How to do that? Some results from the tests are: Server response size = 200 Bytes New means keep-alive were turned Keep-alive mean keep-alive were used with 100 http req/conn C= concurrent requests HTTP HTTPS New | Keep-Alive New | Keep-Alive RPS c= 50 6466 | 20227 1336 | 14461 c= 100 6392 | 21583 1303 | 14683 c = 200 5986 | 21462 1300 | 13967 Throughput(mbps) c = 50 26 | 82.4 5.4 | 59 c=100 25.8 | 88 5.25 | 60 c=200 24 | 88 5.4 | 58 Latency(ms) c= 50 7.5 | 2.7 36 | 3.75 c= 100 15.8 | 5.27 80 | 8 c=200 26.5 | 11.3 168 | 18 With this results I profile squid with "perf" tool and got some results that I could not understand. So my question are related to them For the HTTS case, the CPU utilization peaks around 90% on all cores and the perf profiler gives: 24.63% squid libc-2.15.so [.] __memset_sse2 6.13% squid libcrypto.so.1.0.0 [.] bn_sqr4x_mont 4.98% squid [kernel.kallsyms] [k] hypercall_page | --- hypercall_page | |--93.73%-- check_events Why is so much time spent in one instruction by squid? and too a memset instruction! Any pointers? Since in this case all CPU power is being used so it is understandable that the performance cannot be improved here. The problem arises with the HTTP case. For the plain HTTP case, the CPU utilization is only around 50-60% on all the cores and perf says: 8.47% squid [kernel.kallsyms] [k] hypercall_page --- hypercall_page |--94.78%-- check_events 1.78% squid libc-2.15.so [.] vfprintf 1.62% squid [kernel.kallsyms] [k] xen_spin_lock 1.44% squid libc-2.15.so [.] __memcpy_ssse3_back These results show that squid is NOT CPU bound at this point. Neither is it Network IO bound because i can get much more throughput when I only run the generator with the server. In this case squid should be able to do more. Where is the bottleneck coming from? If anyone is interested with very detailed benchmarks, then I can provide them. -- Regards, -Ahmed Talha Khan