Re: Mixing delay pools and slow aborts

Mike Crowe <mac@xxxxxxxxxx> · Thu, 11 Jun 2009 18:01:22 +0100

I wrote:
>> I'm using Squid (Debian Lenny stable 3.0.STABLE8-3) as a mechanism for
>> pre-caching large downloads for a number of client hosts that are
>> powered down during the night so that the files they require are ready
>> for them to download in the morning. In the late afternoon each of the
>> clients connects to Squid, starts downloading each file and then
>> disconnects as soon as the data starts flowing. I set "quick_abort_min
>> -1" to ensure that Squid continues with the download regardless.
>>
>> I now need to limit the bandwidth used by Squid when caching the
>> files. I initially experimented with Linux traffic control but found
>> the server just stopped sending packets on some of the connections
>> after a while due to not making any progress. Fiddling with timeouts
>> didn't seem to be fully effective. Squid's built in delay pool system
>> worked much better and didn't result in any dropped connections even
>> when concurrently downloading a hundred large files at 100kbit/s. Even
>> the fact that more bandwidth is available during certain hours could
>> easily be handled using "acl time"[1].

[snip]

>> But, as the FAQ states, once the client disconnects there is no longer
>> any link back to the delay pool and the download proceeds at full
>> speed. :(
>>
>> I've been trying to come up with workarounds for this so that I can
>> keep both the bandwidth shaping behaviour and slow abort.
>>
>> My understanding of the Squid code is minimal but looking at
>> MemObject::mostBytesAllowed() I wonder whether it might be possible
>> for MemObject to store the last DelayId so that when the clients list
>> is empty it has something to fall back on? This may be ineffective if
>> all clients have disconnected before the first read but perhaps that
>> can be fixed by setting the persistent DelayId in
>> MemObject::addClient() too.

On Fri, Jun 12, 2009 at 03:42:43AM +1200, Amos Jeffries wrote:
> This may be suitable for your needs, but is not optimal elsewhere. Why  
> should the last visiting client be penalized for an admin configuration  
> choice?

True. Perhaps a dedicated mopping up DelayId would be more
appropriate.

>> Alternatively I've wondered whether I could write a redirector or
>> external ACL helper which effectively ran wget through the proxy for
>> every URL received taking care not to duplicate URLs that it has
>> submitted. I believe that this solution could also easily be extended
>> to support retrying interrupted downloads too which would be a bonus.

> You could. Noting that client requests which test it will hang until the  
> ACL returns, so its probably better to scan the log periodically and  
> re-fetch.

I was intending to just start the wget asynchronously but I've ended
up using a redirector to write the URLs to a log file which I scan
independently as you describe. This makes the system more robust in
the event of reboots and other failures.

I would have used Squid's own log file or the cache manager interface
but I couldn't fathom a way to see full URLs including any query
parameters.

Thank you for your reply.

Mike.