Re: Help Needed: Any suggestion on performance downgrade after enable Cache Digest?

Alex Rousskov <rousskov@xxxxxxxxxxxxxxxxxxxxxxx> · Mon, 21 Apr 2008 08:31:05 -0600

On Mon, 2008-04-21 at 18:48 +0800, Zhou, Bo(Bram) wrote:

> Recently I did some interesting performance testing on the Squid configured
> with Cache Digest Enabled. The testing result shows that the Squid use more
> than 20% CPU time than the Squid running without Cache Digest. 

Thank you for posting the results with a rather detailed description
(please consider also posting Polygraph workload next time you do this).

> Following are
> my detailed testing environment and configuration and result. Anyone can
> give me some light on the possible reason will be greatly appreciated.

Besides fetching peer digests and rebuilding the local digest, using
digests requires Squid to do the following for each "cachable" request:
- compute the digest key (should be cheap)
- lookup the digest hash tables (should be cheap for one peer)
- for CD hits, ask the peer for the object (expensive)
- update the digest (cheap)

As far as I understand, your test setup measured the sum of "cheap"
overheads but did not measure the expensive part. Perhaps more
importantly, the test did not allow for any cache digest hits so you are
comparing no-digest Squid with a useless-digest Squid. It would be nice
if you run a test where all Polygraph Robots request URLs that can be in
both peer caches and where a hit has much lower response time (because
there is no artificial server-side delay). Depending on your hit ratio
and other factors, you may see significant overall improvement despite
the overheads (otherwise peering would be useless).

I am not a big fan of CPU utilization as the primary measurement because
it can bite you if the program does more "select" loops than needed when
not fully loaded. I would recommend focusing on response time while
using CPU utilization as an internal/secondary measurement. However,
let's assume that in you particular tests CPU utilization is a good
metric (it can be!).

20% CPU utilization increase is more than I would expect if there are no
peer queries. On the other hand, you also report 30% CPU increase when
two peers are busy (test1 versus test2). Thus, your test precision
itself can be within that 20% bracket. It would be interesting to see
test4 with one busy and one idle no-digest proxy.

If you can modify the code a little, it should be fairly easy to isolate
the core reason for the CPU utilization increase compared to a no-digest
SquidProfiling may lead to similar results.

For example, I would disable all digest lookups (return "not found"
immediately) and local updates (do nothing) to make sure the CPU
utilization matches that of a no-digests tests. If CPU usage in that
test goes down about 20%, the next step would be to check whether it is
the lookup, the updates, or both. I would leave the lookup off but
reenable the updates and see what happens. Again, profiling may allow
you to do similar preliminary analysis without rerunning the test.

HTH,

Alex.

> Please also point out the possible configuration errors if any. Thanks a
> lot.
> 
> 1. Hardware configuration : HP DL380
> (1) Squid Server
> CPU: 2 Xeon 2.8GHz CPUs, each Xeon CPU has 2 Cores
> Memory size: 6G, Disk: 36G, NIC: 1000M
> (2) Client and Web Server : Dell Vostro200 running with Web Polygraph 3.1.5
> 
> 2. Squid Configuration
> (1) 2 Squid instances are running on the same HP server, each using same IP
> address but different PORT, pure in memory cache
> Squid1 configuration: 
> http_port 8081
> cache_mem 1024 MB
> cache_dir null /tmp
> cache_peer 192.168.10.2		sibling   8082  0     proxy-only
> digest_generation on
> digest_bits_per_entry 5
> digest_rebuild_period 1 hour
> digest_swapout_chunk_size 4096 bytes
> digest_rebuild_chunk_percentage 10
> 
> Squid2 configuration:
> http_port 8082
> cache_mem 1024 MB
> cache_dir null /tmp
> cache_peer 192.168.10.2		sibling   8081  0     proxy-only
> digest_generation on
> digest_bits_per_entry 5
> digest_rebuild_period 1 hour
> digest_swapout_chunk_size 4096 bytes
> digest_rebuild_chunk_percentage 10
> 
> 3. 2 Polygraph Clients are used to send HTTP requests to Squid instances.
> Different client send request to different Squid instance. Each client
> configures 1000 users with 1.2 request/s, so totally each client send 1200
> requests/s.
> 
> 4. Test result (Note: since 4 CPU used on the server, the total CPU
> utilization is 400%)
> (1) Running 2 Squid instances with Cache Digest Enabled, each handles 1200
> request/second: 
> Each instance used ~95% CPU even during the time Squid didn't rebuild the
> digest 
> 
> (2) Running 2 Squid instances with Cache Digest Enabled, one handles 1200
> request/second, one is idle(no traffic to it)
> The one with traffic has CPU utilization ~65%, the other one is idle
> 
> (3) Running 2 Squid instances with Cache Digest Disabled, each handles 1200
> request/second:
> Each instance used ~75% CPU
> 
> 
> Best Regards,
> Bo Zhou
>