RE: large squid machines

"Zak Thompson" <zak@xxxxxxxxxxxxxx> · Wed, 4 Apr 2007 23:01:44 -0400

The Iops off the san is the current bottleneck, actually I believe it's the
iops/disk command queue that's getting backed up as well.  This is why I'm
looking to deploy squid/frontend caching services to ease the pain.

So this is the question, RAM or fast disk drives?  Maybe multiple squids,
try to force images into memory, and the large content to disk?

Unfortunately this isn't a home proxy setup or my life would be much much
easier.

Its clear that the more disk cache you have the longer it takes to access
etc, even running diskd but say 16,24 or 32gb of memory acting as a ramdisk?

-
Zak

There are FAR more images on this network that get hit than movies.  

On Wed, Apr 04, 2007, Zak Thompson wrote:
> Here is my scenario.
> 
> ?I have a ibm DS4200 SAN that has 5 dual dual core 2.0 6gb machines
attached
> to the san.
> The san is SATA driven. Two ?luns 2tb each, that have been lvm?d into ~4tb
> all 5 machines are redhat es 4 64bit, and they all are running GFS. (will
be
> adding more drives to the arrays and adding luns to get some more IO)
> 
> Anyway I have a bottleneck its called IOPS,? we are currently turning on a
> few sites a week onto the cluster? today we hit a brick wall we max?d our
> IOPS.? So I?ve been researching squid setups all day and reverse proxy
> setups.

Are you talking about IOPS off the SAN?

> I have squid up and running on one of the machines and squid is using a
> tmpfs/ramdisk for cache_dir which seems to be working great, the problem
is
> we need to scale up to around 600/req/second to make this cluster perform
> the way it should? so now we are looking into deploying? 2 or 3 squid
> servers to act as a frontend, we can do some load balancing rules to send
> all image/movie/static content off to the squid servers which isn?t a
> problem? the problem is configuring the everything to run smooth.? There
is
> currently? 3.4TB of data on the san itself that needs accessing.? Ideally
> we?d like to keep the most requested/cacheable content on the front
servers
> in hopes to speed up everything.? 

Squid will be mostly fine for that. The biggest CPU speedup for serving
large cached content in Squid will be to fix the way it does memory lookups.

> The data is split into images 5k-80k in size and then movies, avi/wmv/mov
> which are 10mb to 90mb and then there are a couple dozen ~1gb files.?
> Obviously we wouldn?t want to cache the big files.. or do we?

Squid won't perform all that great caching gigabyte-sized files. The images;
sure. The movies; somewhat. Squid-2 will get better at all of this over
time.
(Well, when Henrik/I get time, or someone contributes some patches..)

> I?ve been looking alld ay in the archives and see the disk? option and ram
> option but I have never seen a good example of someone using squid on
16gb+
> of ram.? We are looking into getting two machines right away.? We can do
> multiple 15k rpm sas drives and/or ram we can get 4 machines for the price
> of 2 machines with 24-32gb of ram and the 5 machines would have 6x 15k rpm
> sas drives and 8gb of ram or there abouts.

> So has anyone ever heard of/or done a deployment this large?? Squid not
the
> best method for doing this?? All in all these machines should be pumping
> around between 500Mbps-700Mbps (its a lot of movie downloads)

There are squid deployments this large! But:

* The people doing active squid development don't have that size kit at home
  (my largest machine stops at 2gigabytes of RAM); makes it difficult for us
  to work well with larger setups; and
* Those who deploy large squid installs aren't very active helpers on this
  list..

Adrian