Search squid archive

Re: Squid performance profiling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/06/2013 8:00 p.m., Ahmed Talha Khan wrote:
Hello All,

I have been trying to benchmark the performance of squid for sometime
now for plain HTTP and HTTPS traffic.

The key performance indicators that i am looking at are Requests Per
Second(RPS), Throughput(mbps) and Latency (ms).

My test methodology looks like this

generator(apache benchmark)<------->squid<------>server(lighttpd)


All 3 are running on seperate VM on AWS.
The specs for all the machines are
8 VCPU @ 2.13 GHZ
16 GB RAM
Squid using 8 SMP workers to utilize all cores

Using 8 workers is probably not a godo idea. The recommended practice is to use one core per worker and leave at leave one spare core for the kernels usage. Squid does pass a fair chunk of work to the kernel for I/O, while each workers will completely max out as many CPU cycles as it can grab from its own core. If there is no core retained for kernel usage those two properties will result in CPU contention slowdown as Squid and kernel fight for cycles.


In all these tests I have made sure that the generator and server are
always more powerful than squid. For latency calculation, Time per
request is calculated with and without squid inline and the difference
between them is taken.

I am using a release 3.HEAD just prior to the release of 3.3.

Then please upgrade to 3.3 stable release or a current 3.HEAD . There have been a few memory leaks and issues resolved in the time since 3.3 was released which are fixed in the current stable. There are also additional performance improvements in the current 3.HEAD which will be in 3.4 when it branches.


I want to share the results with the community on the squid wikis. How
to do that?

We are collecting some ad-hoc benchmark details for Squid releases at http://wiki.squid-cache.org/KnowledgeBase/Benchmarks. So far this is not exactly a rigourous testing, although following the methodology for stats collection (as outline in the intro section) retains consistency and improves comparability between submissions.

Since you are using a different methodology, please feel free to write up a new article on it. The details you just posted looks like a good start. We can offer wiki or static web page, or reference from our benchmarking page to a blog publication of your own.

If you are intending to publish the results I do highly recommend that you settle on a packaged and numbered version of Squid so others can replicate the tests or do additional compartive testing on the same code. The 3.HEAD is a rolling release that is relatively difficult to locate the exact sources for any given revision, the numbered packages can be referenced from our permanent archives in your description.


Some results from the tests are:

Server response size = 200 Bytes
New means keep-alive were turned
Keep-alive mean keep-alive were used with 100 http req/conn
C= concurrent requests


  HTTP                                       HTTPS
                                                                 New
| Keep-Alive                   New    | Keep-Alive

RPS
                                       c= 50               6466 | 20227
                           1336 | 14461
                                       c= 100             6392 | 21583
                          1303 | 14683
                                       c = 200            5986 | 21462
                           1300 | 13967

Throughput(mbps)
                                       c = 50               26    |
82.4                                    5.4 | 59
                                       c=100               25.8 | 88
                                   5.25 | 60
                                        c=200              24 | 88
                                     5.4 | 58

Latency(ms)
                                        c= 50              7.5 | 2.7
                                    36 | 3.75
                                        c= 100            15.8 | 5.27
                                80 | 8
                                       c=200              26.5 | 11.3
                                168 | 18


With this results I profile squid with "perf" tool and got some
results that I could not understand. So my question are related to
them

Thank you. Some very nice numbers. I hope they give a clue to anyone still thinking persistent connections need to be disabled to improve performance.

For the HTTS case, the CPU utilization peaks around 90% on all cores
and the perf profiler gives:

24.63%    squid  libc-2.15.so         [.] __memset_sse2

6.13%    squid  libcrypto.so.1.0.0   [.] bn_sqr4x_mont

     4.98%    squid  [kernel.kallsyms]    [k] hypercall_page

               |

               --- hypercall_page

                  |

                  |--93.73%-- check_events


Why is so much time spent in one instruction by squid? and too a
memset instruction! Any pointers?

Squid was originally written in C and still has a lot of memset() calls around the place clearing memory before use. We have made a few attempts to track them down and remove unnecessary usages but a lot still remain. Another attempt was tried in the more recent code, so you may find a lower profile rating in the current 3.HEAD.

Also check whether you have memory_pools on or off. That can affect the amount of calls to memset().

Since in this case all CPU power is being used so it is understandable
that  the performance cannot be improved here. The problem arises with
the HTTP case.

On the contrary, code improvements can be done to reduce CPU cycle requirements by Squid, which in turn raise the performance. If your profiling can highlight things like memset() or Squid functions in the current consuming large amounts of CPU effort can be targeted at reducing those occurances for best work/performance gains.

For the plain HTTP case, the CPU utilization is only around 50-60% on
all the cores and perf says:


8.47%    squid  [kernel.kallsyms]    [k] hypercall_page
                           --- hypercall_page
                           |--94.78%-- check_events

1.78%    squid  libc-2.15.so         [.] vfprintf
1.62%    squid  [kernel.kallsyms]    [k] xen_spin_lock
1.44%    squid  libc-2.15.so         [.] __memcpy_ssse3_back


These results show that squid is NOT CPU bound at this point. Neither
is it Network IO bound because i can get much more throughput when I
only run the generator with the server. In this case squid should be
able to do more. Where is the bottleneck coming from?

Your guesses would seem to be in the right direction. Your data should contain hints where to look closer. memcpy() and memory paging being so high are suspicious hint.


If anyone is interested with very detailed benchmarks, then I can provide them.

Yes please :-)

PS. could you CC the squid-dev mailing list as well with the details. The more developer eyes we can get on this data the better. Although please do test a current release first, we have significantly changed the ACL handling which was one bottleneck in Squid, and have altered the mempools use of memset() is several locations in the latest 3.HEAD code.

Amos




[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux