Squid performance profiling

Ahmed Talha Khan <auny87@xxxxxxxxx> · Thu, 20 Jun 2013 13:00:03 +0500

Hello All,

I have been trying to benchmark the performance of squid for sometime
now for plain HTTP and HTTPS traffic.

The key performance indicators that i am looking at are Requests Per
Second(RPS), Throughput(mbps) and Latency (ms).

My test methodology looks like this

generator(apache benchmark)<------->squid<------>server(lighttpd)

All 3 are running on seperate VM on AWS.
The specs for all the machines are
8 VCPU @ 2.13 GHZ
16 GB RAM
Squid using 8 SMP workers to utilize all cores

In all these tests I have made sure that the generator and server are
always more powerful than squid. For latency calculation, Time per
request is calculated with and without squid inline and the difference
between them is taken.

I am using a release 3.HEAD just prior to the release of 3.3.

I want to share the results with the community on the squid wikis. How
to do that?

Some results from the tests are:

Server response size = 200 Bytes
New means keep-alive were turned
Keep-alive mean keep-alive were used with 100 http req/conn
C= concurrent requests

 HTTP                                       HTTPS
                                                                New
| Keep-Alive                   New    | Keep-Alive

RPS
                                      c= 50               6466 | 20227
                          1336 | 14461
                                      c= 100             6392 | 21583
                         1303 | 14683
                                      c = 200            5986 | 21462
                          1300 | 13967

Throughput(mbps)
                                      c = 50               26    |
82.4                                    5.4 | 59
                                      c=100               25.8 | 88
                                  5.25 | 60
                                       c=200              24 | 88
                                    5.4 | 58

Latency(ms)
                                       c= 50              7.5 | 2.7
                                   36 | 3.75
                                       c= 100            15.8 | 5.27
                               80 | 8
                                      c=200              26.5 | 11.3
                               168 | 18

With this results I profile squid with "perf" tool and got some
results that I could not understand. So my question are related to
them

For the HTTS case, the CPU utilization peaks around 90% on all cores
and the perf profiler gives:

24.63%    squid  libc-2.15.so         [.] __memset_sse2

6.13%    squid  libcrypto.so.1.0.0   [.] bn_sqr4x_mont

    4.98%    squid  [kernel.kallsyms]    [k] hypercall_page

              |

              --- hypercall_page

                 |

                 |--93.73%-- check_events

Why is so much time spent in one instruction by squid? and too a
memset instruction! Any pointers?

Since in this case all CPU power is being used so it is understandable
that  the performance cannot be improved here. The problem arises with
the HTTP case.

For the plain HTTP case, the CPU utilization is only around 50-60% on
all the cores and perf says:

8.47%    squid  [kernel.kallsyms]    [k] hypercall_page
                          --- hypercall_page
                          |--94.78%-- check_events

1.78%    squid  libc-2.15.so         [.] vfprintf
1.62%    squid  [kernel.kallsyms]    [k] xen_spin_lock
1.44%    squid  libc-2.15.so         [.] __memcpy_ssse3_back

These results show that squid is NOT CPU bound at this point. Neither
is it Network IO bound because i can get much more throughput when I
only run the generator with the server. In this case squid should be
able to do more. Where is the bottleneck coming from?

If anyone is interested with very detailed benchmarks, then I can provide them.

--
Regards,
-Ahmed Talha Khan