Re: Architecture

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Wed, 24 Jun 2009 12:21:10 +1200

On Tue, 23 Jun 2009 21:18:36 -0200, "Ronan Lucio" <listas@xxxxxxxxxxxx>
wrote:
> Hi Kinkie,
> 
> On Tue, 23 Jun 2009 21:51:17 +0200, Kinkie wrote
>> Hi,
>>   I can't see the advantage of using lighthttpd instead of squid+carp
>> as the frontend,
> 
> The idea of putting a lighttpd server as a the frontend is for load
> balance.
> 
> What exactly do you mean with squid+carp? several squid servers working
as
> one?

Squid placed as load balancer. Using CARP selection protocol for the
balancing.

These top-layer Squid generally don't cache, but are memory-only very high
throughput services. CARP ensures that URL are sent consistently to the
second layer of Squid for most efficient (non-duplicate) caching and
failover.

The wikimedia deployment does it this way for their front-end.
http://meta.wikimedia.org/wiki/File:Wikimedia-servers-2009-04-05.svg

> Can I have it working in an external DataCenter?

Most likely. As with any HTTP hierarchy the location of the individual hops
is not relevant to the traffic flow. But for best performance results the
underlying network topology and capacity should be considered.

What the Squid+carp offers that lighthttpd does not AFAIK is the CARP
algorithm for 'sticky' URL. So that objects are not duplicated around all
the caches. Some duplicate slippage may occur when peers die/return. But
its much less than would normally occur.

> If so it seems to be a better solution, even because it's a fault
tolerance
> solution.
> 
>> and if using lighthttpd i can't see the advantage of
>> not serving static content directly out of the balancer.
> 
> Actually, I'm just afraid of overload the server.
> Initially I don't know exactly how much resources would it consume from
> each
> server.
> If a server like that fits executing two roles, I'm sure it would be
> better.
> 
>> Also watch out as nfs has locking and scaling issues of its own
>> (assuming thet nfs is what you mean by "single filesystem"), and it
>> also introduces a very nasty point-of-failure.
> 
> Yes, it's a NAS.

What kinkie means is that the efficiency is determined by the type of NAS.

Squid performs a high-churn random-access IO when in full operation. And
needs to do so at the highest possible speed.  The absolute _best_ solution
is to have a high speed storage device spindle dedicated to just one
cache_dir in one Squid. None of the SAN/NAS I'm aware of can make that kind
of find-grained assignment.

That said, modern hardware SAN/NAS solutions and even some newer software
ones are very efficient and can provide useful service levels. But the
traditional nfs and samba file system NAS can potentially introduce serious
speed issues when scaled.

1ms may not sound like much IO wait. But when it affects 3K concurrent
connections simultaneously (ie one loaded Squid) that scales up towards a
3sec delay in every request.

> 
> Kinkie, the architecture shouldn't be that suggested from me.
> It's just how I could figure out. Of course I want to make it better.
> Do you have a suggestion for that?
> 
> For all I have understood your suggestion is:
> 
> 1) Some squid servers + carp

Almost...

1a) Squid load balancer using carp to select layer-2.

1b) Squid caching servers

then whatever backend you like...

> 
> 2) Application server as the backend servers
> 
> 3) A third server serving static resources

It's up to you whether the system is large enough to require separate (2)
and (3).
For small enough (for some value of small) its sufficient to combine them
and provide caching headers to push most of the static work into the (1b)
Squid.

> 
> I just didn't figure out your suggestion for storage.

Hopefully my comment above has clarified that a little.

Amos