Re: Cache ran out of descriptors due to ICAP service/TCP SYNs ?

"Ahmad, Sarfaraz" <Sarfaraz.Ahmad@xxxxxxxxxx> · Wed, 18 Jul 2018 06:30:46 +0000

Thanks for the reply. I haven't completely understood the revert and have a few more related questions.

I see these messages, 
Jul 17 19:21:14 proxy2.hyd.deshaw.com squid[5747]: suspending ICAP service for too many failures
Jul 17 19:21:14 proxy2.hyd.deshaw.com squid[5747]: optional ICAP service is suspended: icap://127.0.0.1:1344/reqmod [down,susp,fail11]
1)   If the ICAP service is unresponsive, Squid would not exhaust its file descriptors trying to reach the service again and again right (too many TCP SYNs for trying to connect to the ICAP service )? 

Max Connections returned by the ICAP service is 16. And given my ICAP settings, 
icap_enable on
icap_service test_icap reqmod_precache icap://127.0.0.1:1344/reqmod bypass=on routing=off on-overload=wait
On-overload is set to "wait". The documentation says " * wait:   wait (in a FIFO queue) for an ICAP connection slot" . This means that a new TCP connection would not be attempted if max connections is reached right ? 
2)   Am I right in saying that if the ICAP service is underperforming or has failed, this won't lead a sudden increase in the open file descriptors with on-overload set to "wait" ?

Also I have no way to explain the "connection reset by peer" messages.
Jul 13 11:23:18 <hostname> squid[13123]: Error negotiating SSL connection on FD 1292: (104) Connection reset by peer
Jul 13 11:23:18 <hostname> squid[13123]: Error negotiating SSL connection on FD 1631: (104) Connection reset by peer
Jul 13 11:35:17 <hostname> squid[13123]: Error negotiating SSL connection on FD 1331: (104) Connection reset by peer

I have a few proxies (running in separate virtual machines). All of them went unresponsive at around the same time, leading to an outage of the internet.
I am using WCCPv2 to redirect from firewall to these proxies.  I checked the logs there and WCCP communication was not intermittent.
The logs on the proxies are bombarded with " Error negotiating SSL connection on FD 1331: (104) Connection reset by peer " messages. 
Since the ICAP service in not SSL-protected I think these messages mostly imply receiving TCP RSTs from remote servers. (or could it be clients somehow ?). Once I removed WCCP direction rules from the firewall, internet was back up.
This hints that something in this proxy pipeline was amiss and not with the internet link itself. I don't see any outages on that. 
I am pretty sure ACLs weren't changed and there was no forwarding loop.
What could possibly explain the connection reset by peer messages ? Even if the internet was down, that won't lead to TCP RSTs. 
I cannot tie these TCP RSTs and the incoming requests getting held up and ultimately leading to FD exhaustion.

You earlier said 
>> In normal operation it is not serious, but you are already into abnormal operation by the crashing. So not releasing sockets/FD fast enough makes the overall problem worse.
If squid-1 is crashing and getting respawned, it will have its own 16K FD limit right, I wonder how the newer squid-1 serves older requests. Can you please elaborate on " So not releasing sockets/FD fast enough makes the overall problem worse." ?

Please share your thoughts.

Regards,
Sarfaraz

-----Original Message-----
From: squid-users <squid-users-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Amos Jeffries
Sent: Tuesday, July 17, 2018 6:22 PM
To: squid-users@xxxxxxxxxxxxxxxxxxxxx
Subject: Re:  Cache ran out of descriptors due to ICAP service/TCP SYNs ?

On 17/07/18 19:17, Ahmad, Sarfaraz wrote:
> Can somebody please explain what could have happened here?
> 
>  
> 
> First squid(4.0.25) encountered a URL > 8K bytes. I think this caused 
> it to crash.
> 

Unless you patched the MAX_URL definition to be larger than default, that should not happen. So is a bug IMO.

If you did patch MAX_URL, then you have encountered one of the many hidden issues why we keep it low and <https://bugs.squid-cache.org/show_bug.cgi?id=4422> open. Any assistance finding out where that crash occurs is VERY welcome.

>  
> 
> Jul 13 11:04:13 <hostname> squid[9102]: parse URL too large (9697 
> bytes)
> 
> Jul 13 11:04:13 <hostname> squid[29254]: Squid Parent: squid-1 process
> 9102 exited due to signal 11 with status 0
> 
>  
> 
> squid-1 was respawned by the parent squid process.
> 
>  
> 
> Then I see ,
> 
> WARNING: ICAP Max-Connections limit exceeded for service 
> icap://127.0.0.1:1344/reqmod. Open connections now: 16, including 0 
> idle persistent connections.
> 
> The newly spawned squid-1  crashes yet again. As seen below,
> 
> Jul 13 11:16:14 <hostname> squid[29254]: Squid Parent: squid-1 process
> 10951 exited due to signal 11 with status 0
> 
> Logs don’t explain why squid-1 crashed here. ICAP message above is 
> just a warning.

In normal operation it is not serious, but you are already into abnormal operation by the crashing. So not releasing sockets/FD fast enough makes the overall problem worse.

>From the below log and config that this ICAP service is *optional* (bypass=on). So Squid is free to ignore its use entirely if FD run out.
That is probably why it is only listed as WARNING. But is still consuming FDs before it gets to that state.

> 
> squid-1 is respawned a second time and I see,
> 
>  
> 
> Jul 13 11:22:18 <hostname> squid[13123]: ERROR: negotiating TLS on FD
> 1722: error:14090086:SSL
> routines:ssl3_get_server_certificate:certificate verify failed 
> (1/-1/0)
> 

Look into why. This along with your crash are the top two issues adding to the overall situation.
>  
> 
> There is only one icap service defined as below :
> 
>  
> 
> icap_enable on
> 
> icap_service test_icap reqmod_precache icap://127.0.0.1:1344/reqmod 
> bypass=on routing=off on-overload=wait
> 

>  
> 
> The open file ulimit is set to 16k. How many TCP connections would 
> Squid have opened up that it exhausted 16k file descriptors ?  Some 
> sort of file descriptor leak ?

Only if your traffic is high enough to leak that fast.

More likely is a forwarding loop situation. Where one outbound server connection consumes _infinite_ FD sockets.

> 
> I am unable to connect the dots where an unresponsive ICAP service 
> lead to the proxy running out of file descriptors ?  Too many TCP SYN attempts ?
> 

That two are probably unrelated. Unless it is the ICAP socket being looped back to Squid, or your traffic req/sec is extremely high and one of a few ICAP connection bugs occuring (eg lack of a way to cleanly signal connection error to ICAP).

Amos
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users
_______________________________________________
squid-users mailing list
squid-users@xxxxxxxxxxxxxxxxxxxxx
http://lists.squid-cache.org/listinfo/squid-users