Re: Evaluating traffic for caching benefits.

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Tue, 02 Jun 2009 12:57:41 +1200

On Mon, 1 Jun 2009 17:29:18 -0700, Ray Van Dolson <rvandolson@xxxxxxxx>
wrote:
> Any suggestions on how to go about evaluating web traffic for
> "cacheability"?  I have access to a port that can see all the web
> traffic in our company.
> 
> I'd like to be able to gauge how many hits there are to common sites to
> get a feel for how much bandwidth savings we could potentially gain by
> implementing a company-wide web cache.

Depending on how much you tune the config, you should expect 10% of web
traffic to be a lower bound (no tuning) and 50% an upper bound. That is for
HTTP traffic only, so the overall % is less depending on the non-HTTP going
through your network.

> 
> I suppose creative use of tcpdump could be used here (obviously not
> catching https traffic), but maybe there's a more polished tool or some
> slicker way to do this.

The most reliable way to know is to setup a test proxy and start pushing a
small amount of the traffic through it.  The summary overview of Squid
contains measure of % bandwidth that has been local HIT (saved from going
external).

The tools out there (www.ircache.net/cgi-bin/cacheability.py and
redbot.org) are more spot-check tools for finding out why something
particular isn't caching once the resource is known.

If anyone knows of a stand-alone tool please speak up, I'm interested as
well.

Amos