Re: Re: can squid load data into cache faster than sending it out?

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Thu, 12 May 2011 13:37:13 +1200

On 12/05/11 08:18, Dave Dykstra wrote:
On Wed, May 11, 2011 at 09:05:08PM +1200, Amos Jeffries wrote:
On 11/05/11 04:34, Dave Dykstra wrote:
On Sat, May 07, 2011 at 02:32:22PM +1200, Amos Jeffries wrote:
On 07/05/11 08:54, Dave Dykstra wrote:
Ah, but as explained here
     http://www.squid-cache.org/mail-archive/squid-users/200903/0509.html
this does risk using up a lot of memory because squid keeps all of the
read-ahead data in memory.  I don't see a reason why it couldn't instead
write it all out to the disk cache as normal and then read it back from
there as needed.  Is there some way to do that currently?  If not,

Squid should be writing to the cache in parallel to the data
arrival, the only bit required in memory being the bit queued for
sending to the client.  Which gets bigger, and bigger... up to the
read_ahead_gap limit.

Amos,

Yes, it makes sense that it's writing to the disk cache in parallel, but
what I'm asking for is a way to get squid to keep reading from the
origin server as fast as it can without reserving all that memory.  I'm
asking for an option to not block the reading from the origin server&
writing to the cache when the read_ahead_gap is full, and instead read
data back from the cache to write it out when the client is ready for
more.  Most likely the data will still be in the filesystem cache so it
will be fast.

That will have to be a configuration option. We had a LOT of
complaints when we accidentally made several 3.0 act that way.

That's interesting.  I'm curious about what people didn't like about it,
do you remember details?

The bandwidth overflow mentioned below.

...
perhaps I'll just submit a ticket as a feature request.  I *think* that
under normal circumstances in my application squid won't run out of
memory, but I'll see after running it in production for a while.

So far I haven't seen a problem but I can imagine ways that it could
cause too much growth so I'm worried that one day it will.

Yes, both approaches lead to problems.  The trickle-feed approach
used now leads to resource holding on the Server. Not doing it leads
to bandwidth overload as Squid downloads N objects for N clients and
only has to send back one packet to each client.
  So its a choice of being partially vulnerable to "slow loris" style
attacks (timeouts etc prevent full vulnerability) or packet
amplification on a massive scale.

Just to make sure I understand you, in both cases you're talking about
attacks, not normal operation, right?  And are you saying that it is
easier to mitigate the trickle-feed attack than the packet-amplification
attack, so trickle-feed is less bad?  I'm not so worried about attacks
as normal operation.

Both are real traffic types, the attack form is just artificially 
induced to make it worse. Like ping-flooding in the 90's it happens 
normally, but not often. All it takes is a large number of slow clients 
requesting non-identical URLs.

IIRC it was noticed worse by cellphone networks with very large numbers 
of very slow GSM clients.
 A client connects sends request, Squid reads back N bytes from server 
and sends N-M to the client. Repeat until all FD available in Squid are 
consumed. During which time M bytes of packets are overflowing the 
server link for each 2 FD used. If the total of all M is greater than 
the server link size...

Under the current design the worst case is Server running out of FD 
first and reject new connections. Or TCP protections dropping 
connections and Squid aborting the clients early. The overflow factor is 
32K or 64K linear with the number of FD and cant happen naturally where 
the client does read the data just slowly.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.12
  Beta testers wanted for 3.2.0.7 and 3.1.12.1