Re: Architecture for scaling delivery of large static files

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Thu, 16 Jul 2009 02:08:41 +1200

Jamie Tufnell wrote:
Hi,

I am wondering if Squid is the right tool to solve a scaling problem
we're having.

Our static content is currently served directly from Apache boxes to
the end-user:

User <=> Apache

Originally it was just one Apache box but its Disk IO became saturated
and now we
have three Apache boxes, each with their own copy of our library on
direct-attached
storage.

The problem is our library is getting quite large and although three copies of
everything is nice, I think continuing down this road any further is
going to result in
a lot of unnecessary duplication of content (read: $.)

So, I'm thinking of changing it to be more like this:

User <=> Squid CARP <=> Squid Caches <=> Apache

The idea being we can scale delivery capacity and library capacity
independently.

When our delivery needs grow/shrink, we'll add/remove machines in the
Cache layer.
These machines would be RAM heavy and have a high spindle to GB ratio (or SSD.)

When our library grows/shrinks we'll attach/detach storage to the
Apache origin server.
We'll use two of the existing Apache servers as the origin with
failover for redundancy.

So, is this a sane use of Squid?

Yes.

 Is there a better way to approach this?

maybe yes, maybe no.

Add in collapsed_forwarding and persistent connections and you have one 
shining start of scalability.

You could even scale to multiple Apache with different parts of the 
library each and Squid routing requests to the right spot.

Just note that for MB or so scale files in memory Squid-2 is a snail, 
and Squid-3 does not yet provide collapsed forwarding.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
  Current Beta Squid 3.1.0.9