AW: Re: keepaliveNextRequest: abandoning FD

"Rietzler, Markus \(RZF, SG 324 / <RIETZLER_SOFTWARE>\)" <markus.rietzler@xxxxxxxxx> · Fri, 30 Nov 2012 12:33:46 +0000

i will keep the original conversation as reference.

we had log entries in cache.log like

2012/11/21 13:09:40 kid1| client_side.cc(1602) keepaliveNextRequest:
abandoning local=123.45.1.1:8080 remote=123.45.6.7:1988 FD 2247
flags=1

we are using squid 3.2.3 (also recent nightly build) together with ufdbguard blocking.

when someone tries to access pages like say www.facebook.com or www.dropbox.com ufdbguard will block those requests.
with http-requests everything is ok. with https-requests the browser just shows a "connection error" message and we see the above log entries in squid. the "connection error" message is also ok (just by design).

in the last days we had two problems: 1) our ntlm-auth-helpers keep hitting the max configures children and then squid stoped responding, 2) I think this is directly connected to 1) we were running out of filedescriptors.

we also tried the nightly build, increased the number of filedescriptors, but also in this case we hit the configured max file descriptors.

our findings:

when you access a https page and ufdbguard blocks it, the connection will be held open (I think there should be some timeout when the connection will be dropped). the problem eg. was with dropbox client running on the PC. the client contantly tries to connect to dropbox via https and thus there were more and more open connection hanging round. we could see that the used filedescriptors become more and more.

we then patched ufdbguard to handle https requests/redirects better. there is now one drawback, we get a certificate error displayed in the browser, because the browser issued a request to http://www.facebook.com and will get the answer with a ssl certificated issued to "OURSERVER". so the ssl certificate does not match and the browser shows this hint/error message. 

but what is more important: we don't see any error message in cache.log and the connection is correctly closed (more see following remarks), no more filedescriptors or ntlm-auth-helpers running wild and crashing.

what is our patch:

in case of a block ofdbguard will return something like

	GET http://ourserver/cgi-bin/blocked.pl

when we tried to access https://www.facebook.com, ufdbguard would also return

	GET http://ourserver/cgi-bin/blocked.pl

so client issued a CONNECT request, but get answered with a GET...

we now return a 

	CONNECT ourserver

in case of a blocked https request. in httpd.conf or ourserver we had a rewrite-rule, that will answer with a /cgi-bin/blocked.pl when we just ask https://ourserver. the blocked.pl also set the http header "Connection: close".
so all the connections will be closed, everything is fine.

even clients like dropbox now through a reasonable error message and stop requesting the page again and again.

> -----Ursprüngliche Nachricht-----
> Von: Amos Jeffries [mailto:squid3@xxxxxxxxxxxxx]
> Gesendet: Donnerstag, 22. November 2012 01:23
> An: squid-users@xxxxxxxxxxxxxxx
> Betreff: Re:  Re: keepaliveNextRequest: abandoning FD
> 
> On 22.11.2012 01:14, Rietzler, Markus (RZF, SG 324 /
> <RIETZLER_SOFTWARE>) wrote:
> > we are seeing the same erros in our cache_log
> >
> > On 12/11/2011 6:06 a.m., Alex Rousskov wrote:
> >> On 11/10/2011 07:11 PM, Amos Jeffries wrote:
> >>> On 11/11/2011 11:01 a.m., Alex Rousskov wrote:
> >>>> Hello,
> >>>>
> >>>> I see thousands of these messages on busy caches:
> >>>>
> >>>>> 2011/11/07 05:16:23 kid1| client_side.cc(1573)
> >>>>> keepaliveNextRequest:
> >>>>> abandoning FD 6650
> >>>>> 2011/11/07 05:16:27 kid3| client_side.cc(1573)
> >>>>> keepaliveNextRequest:
> >>>>> abandoning FD 9180
> >>>>> 2011/11/07 05:16:28 kid5| client_side.cc(1573)
> >>>>> keepaliveNextRequest:
> >>>>> abandoning FD 6361
> >>>>> 2011/11/07 05:16:28 kid6| client_side.cc(1573)
> >>>>> keepaliveNextRequest:
> >>>>> abandoning FD 3322
> >>>>> 2011/11/07 05:16:31 kid2| client_side.cc(1573)
> >>>>> keepaliveNextRequest:
> >>>>> abandoning FD 7809
> >>>>> 2011/11/07 05:16:32 kid3| client_side.cc(1573)
> >>>>> keepaliveNextRequest:
> >>>>> abandoning FD 121
> >>>> The code says:
> >>>>
> >>>>> // XXX: Can this happen? CONNECT tunnels have deferredRequest
> >>>>> set.
> >>>>> debugs(33, DBG_IMPORTANT, HERE<< "abandoning"<<
> >>>>> conn->clientConnection);
> >>>> So, the answer to that XXX question is "yes, it can happen", at
> >>>> least in
> >>>> older v3.2 code. Does anybody know whether there were any recent
> >>>> changes
> >>>> that were meant to address the above?
> >>> Yes the ConnStateData::stopReading() was moved to only happen when
> >>> about
> >>> to call tunnelStart() in client_side_request.cc instead of all
> >>> CONNECT
> >>> method requests.
> >>>
> >>> CONNECT assumes full control over the connection, but ssl-bump uses
> >>> the
> >>> ConnStateData.
> >>
> >> Understood. Since there were related changes, there is no need to
> >> change
> >> the verbosity of the message now. Let's see if those warnings are
> >> still
> >> there after they upgrade to more recent v3.2 releases.
> >>
> >>
> >> Thank you,
> >>
> >> Alex.
> >> P.S. The caches in question do not use SslBump AFAIK.
> >
> > there were also a discussion from may this year
> >
> >> tis 2012-05-01 klockan 08:09 -0500 skrev Guy Helmer:
> >>> I'm working with code I obtained from Alex that was sync'ed with
> >>> trunk as of -r12082 (2012-03-07 v3.2.0.16+) and on a very busy system
> >>> doing forward HTTP and HTTPS proxy (but not sslBump), I am seeing
> >>> lots of these messages:
> >>>
> >>> 2012/05/01 08:50:06 kid1| client_side.cc(1611)
> >>> keepaliveNextRequest: abandoning local=10.10.0.4:3128
> >>> remote=10.10.0.115:1409 FD 1113 flags=1
> >>> 2012/05/01 08:50:28 kid1| client_side.cc(1611)
> >>> keepaliveNextRequest: abandoning local=10.10.0.4:3128
> >>> remote=10.10.0.125:1356 FD 1024 flags=1
> >>> 2012/05/01 08:50:28 kid1| client_side.cc(1611)
> >>> keepaliveNextRequest: abandoning local=10.10.0.4:3128
> >>> remote=10.10.0.125:1357 FD 1220 flags=1
> >>> 2012/05/01 08:50:31 kid1| client_side.cc(1611)
> >>> keepaliveNextRequest: abandoning local=10.10.0.4:3128
> >>> remote=10.10.0.125:1359 FD 1152 flags=1
> >>>
> >>> I see that there was a discussion back in November about this
> >>> issue, but I am not understanding what the root cause is, and whether
> >>> it is indicative of a serious problem. Any thoughts?
> >>
> >> It would be excellent if you could pair one or two such message with
> >> the
> >> network traffic taking place on the client connection (the client is
> >> remote ip:port in the message).
> >>
> >> Regards
> >> Henrik
> >
> >
> > we are using squid 3.2.3 and also with the latest nightly build
> > (squid-3.2.3-20121119-r11700).
> >
> > first we thought this error is some new strange behavior of squid
> > under heavy load.
> > after some testing we found one error cause:
> >
> > we are using ufdbguard, squid.conf:
> >
> > url_rewrite_program /rzf/produkte/www/bin/ufdbguard/ufdbgclient -e
> > allow -C -l/rzf/produkte/www/files/ufdbguard/logs
> >
> >
> > we see this error messages when there was a https-request which is
> > blocked by ufdbguard.
> 
> Ah, in this case everything appears to be working exactly as designed.
> 
>   ... SSL-Bump is designed to pass the client a certificate representing
> the server Squid is contacting.
> 
>   ... SSL is designed to warn the client when traffic is intercepted.
> 
>   ... URL-rewrite interface DOES NOT "block" anything. Ever.
> 
> All the URL helper interface to Squid (ie used by ufdbguard) does is
> redirect (HTTP Status 30x) or alter the destination server (URL
> re-write).
> 
> This second re-write action is a form of interception. As is SSL-Bump
> interception. When you combine the two there are some behaviours which
> are to be expected:
>   * Getting the certificate wrong when you re-write the URL on an HTTPS
> URL is no more unexpected than getting HTML page URLs or Cookie domain
> names wrong when re-writing HTTP URLs.
>   * server-first bumping is not possible when re-writing the URL to
> contact a non-SSL HTTP, FTP, or Gopher server.
> 
> 
>   *please* do yourself a major favour and avoid *re-writing* URLs - 30x
> redirect is fine fro almost all events and avoids all these problems and
> more.
> 
> 
> If you can't alter the re-writer to perform redirection you can work
> around that by using:
> 
>   acl foo ... some test to match the re-written URL ...
>   deny_info 302:%s foo
>   adapted_http_access deny foo
> 
> IFAIK  ssl_bump_access can also use the 'foo' ACLs to determine
> client-first bumping on the re-written URLs. That will only work on the
> first CONNECT request though. Which again AFAIK, is the only point where
> certificate issues like this can occur.
> 
> 
> You could also workaround it by SSL-enabling your backend server
> presenting the errors. That will change the connection errors to
> domain-mismatch certificate errors at the client end but they will at
> least be able to read the rejection page if they click through the cert
> problems.
> 
> NOTE: passing the client request over an HTTP link after they connected
> with HTTPS is technically a critical security vulnerability. There may
> be any form of sensitive data in the request headers from the client
> which were the reason for it being HTTPS in the first place.
> 
> 
> > when ufdb blocks a request we do a redirect on a http-page to give
> > the user an error response.
> > with https-sites this does and can not work. the browser is expecting
> > a correct ssl answer and waits to get the right ssl-certificate from
> > the origin server, but our http-server does answer either with http
> > and not https and the fqdn is not correct. so in the browser I only
> > see a general "connection error" message, not our blocked script.
> >
> > this is the client side. in the cache_log we get
> >
> > 2012/11/21 13:09:40 kid1| client_side.cc(1602) keepaliveNextRequest:
> > abandoning local=123.45.1.1:8080 remote=123.45.6.7:1988 FD 2247
> > flags=1
> >
> >
> > now the question to this long mail:
> >
> > are these errors harmless, do we have to care about them? we have
> > really a lot of these messages.
> > what about open connections, ssl tunnels/connects, ntlm and/or basic
> > authentication handlers. are those connections all get closed
> > cleanly....
> 
> They are both harmless and not harmless.
> 
> They are harmless to Squid and its operation - beyond the wasted CPU
> cycles and bandwidth trying to cope with the issue.
> 
> They are harmful to to your service image. All these connection
> failures are, as you note, visible to the clients AND occuring often.
> 
> 
> Amos