Re: Q: http keepalives and time_wait sockets

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Mon, 19 Apr 2010 23:58:02 +0000

On Mon, 19 Apr 2010 16:38:10 +0200, Gaetano Giunta
<giunta.gaetano@xxxxxxxxx> wrote:
> Amos Jeffries wrote:
>> Gaetano Giunta wrote:
>>> Q1: Reading the archives of this mailing list, I concluded that squid 
>>> does not support using keep-alive in connections to source servers.
>>>
>>> I assume that setting keep-alive On on an Apache source server cached 
>>> by squid would thus be harmless: since squid does not do keepalives, 
>>> the connections would be terminated immediately - Apache would waste 
>>> time keeping processes busy waiting on sockets after squid terminated 
>>> its http/1.0 request.
>>>
>>> The advantage being that is the same apache server serves some other 
>>> site beside the one cached by squid, or if squid is disabled 
>>> temporarily, keepalives will be automatically in effect.
>>>
>>> Is this correct? Are there any advantages/drawbacks that are escaping 
>>> me?
>>
>> The conclusion that Squid does not support keepalive is incorrect. 
>> Note the default config setting.
>> http://www.squid-cache.org/Doc/config/server_persistent_connections/
>>
> Sorry, I must have confused it with HTTP 1.1 (I was not fully aware of 
> HTTP 1.0 keep alives).
> 
> One more question about keepalives is then the following: is a keepalive

> connection to the origin server only used for a single client, or can it

> be used for requests coming from many different clients?

In general Squid just pools the server pconn and reuses for any requests
to that domain name (to cope with virtual hosting). Squid-2 ties the client
IP address and does something I don't quite follow completely. recent
Squid-3 does that only when NTLM or Kerberos is needs the connection to be
pinned (older 3.0 did something similar to 2.x but broken).

> If the former, is the keeplaive connection to the origin server kept 
> open as long as the one with the client is?

In the general case. Longer. Only closed when too many FD are in use, if
the server closes the connection, if the pconn_timeout is configured and
occurs, or if Squid is restarted/reconfigured.

> In short: is there a recommendation about setting the keepalivetimeout /

> maxkeepaliverequests parameters for a high traffic apache server proxied

> by squid, where the squid cache hit rate is about 80%?

I don't have any opinion on that.

> If the average webpage contains eg. 50 objects, with such a cache hit 
> rate a client requesting them all with keepalive would on average 
> generate requests for 10 objects to the origin server - but the requests

> could be 'sparse' in time, for the client would spend some time in 
> between requesting/receiveing the objects that squid has in cache...

Yes. Benefit depends on your load and traffic profile.
There is little harm in holding a few server FD open and idle if it
prevents many re-connects.

> 
>> Also, HTTP/1.0 protocol assumes that keepalive is off unless 
>> explicitly stated as provided. Thus Apache receiving HTTP/1.0 request 
>> without keepalive permitted will result in the Apache will terminate 
>> the connection or send back a reply explicitly requesting the proxy to 
>> do keepalive.
>>
>> When Squid closes any connection the far-end always receives a FIN or 
>> RST TCP message. They are not left hanging waiting for data.
>>
>>>
>>> Q2: Using squid as reverse proxy, I have seen that a lot of sockets 
>>> (200 to 300) are always listed on the origin server, coming from the 
>>> squid server, in TIME_WAIT status.
>>> Using netstat on the squid server itself at the same time, I see 
>>> about 10 open sockets, none of which in TIME_WAIT.
>>> Is this normal? Is it a sign of some misconfiguration? (note: there 
>>> is probably a firewall currently sitting between the two servers). 
>>> Can it be related somehow to keepalives?
>>
>> Maybe yes, maybe no. More likely related to unknown-length objects 
>> with the server closing the connection after each send to signal 
>> end-of-object.
> Interesting.
> I suppose that files (images/css/js) always have a known length, that 
> would leave only requests for dynamically generated objects as possible 
> cause (or responses without bodies?).

Normally yes. IMO unknown-length is a sign of breakage somewhere on most
websites, even dynamic ones. Particularly with template driven dynamic
sites, where the entire size and last-modified of every component is known
before the body starts being sent.

Amos