Re: Caching issue with http_port when running in transparent mode

Eliezer Croitoru <eliezer@xxxxxxxxxxxx> · Tue, 29 May 2012 15:13:32 +0300

one important thing to be aware of is that if you are using the same box 
as a gateway and squidbox it's better to use the "redirect" instead of DNAT.

you can always try to use:
http://nocat.net/downloads/NoCatSplash/

or to write your own helper.
it can be pretty simple to build such an helper and you will just need 
to use some NAT chains\tables on iptables that will redirect any 
connection to the world into the webserver with a login page that 
connected to a script that will do some stuff in the iptables "allow" table.

do you need to apply some username and password mechanism\auth or just 
splash screen to agree some rules\agreement ?

Eliezer

On 29/05/2012 09:12, Hans Musil wrote:
Amos Jeffries wrote:
On 29.05.2012 08:13, Eliezer Croitoru wrote:
hey there Hans,

are you serving squid on the same machine as the gateway is?(wasnt
sure about the DNAT).
your problem is not directly related to squid but to the way that tcp
and browsers works.
for every connection that the client browser uses exist a tcp windows
that stays alive for a period of time after the page was served.
this will cause to all the connections that was served using port
3128 to still exist for i think 5 till 10 more minutes or whatever is
your tcp stack settings.

While that is true for the TCP details I think HTTP connection
behaviour is why that matters. For the TCP timeouts closure to start
happening HTTP has to first stop using the connection.

iptables NAT only affects SYN packets (ie new connections). So any
existing TCP connections made by HTTP WILL continue to operate despite
any changes to NAT rules.

HTTP persistent connections, CONNECT tunnels and HTTP
"streaming"/large objects have no fixed lifetime and several minutes
for idle timeout. It is quite common to see client TCP connections
lasting whole hours or days with HTTP traffic flow throughout.

On 28/05/2012 22:34, Hans Musil wrote:
Hi,

my box is running on Debian Sqeeze, which uses SQUID version
2.7.STABLE9, but my problem also seems to affect SQUID version 3.1.

These are the importend lines from my squid.conf:

http_port 3128 transparent
http_port 3129 transparent
url_rewrite_program /etc/squid/url_rewrite.php

First, I did configure my Linux iptables like this:

# Generated by iptables-save v1.4.8 on Mon May 28 21:04:09 2012
*nat
:PREROUTING ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A PREROUTING -i eth1 -p tcp -m tcp --dport 80 -j DNAT
--to-destination 10.17.0.1:3128
COMMIT

and everything works fine.

But when I change the redirect port in the iptables settings from
3128 to 3129, Squid behaves strange: My URL rewrite program still
gets send myport=3128, althought there is definitely no more request
on this port, but only on 3129. This only affects HTTP domains that
already have been requested before, i.e. with redirection to port
3128, and it works fine again when I do a force-reload on my
browser. Also, things turn well when waiting some minutes.

I suppose there is some strange caching inside Squid that maps the
HTTP domain to an incoming port.

No. There is only an active TCP connection. Multiple HTTP request can
arrive on the connection long after you start sending unrelated new
connections+requests through other ports.

What your helper was passed is the details about the request Squid
received. It arrived on a TCP connection which was accepted through
Squid port 3128. The fact that you changed the kernel settings after
that connection was setup and operating is irrelevant.

URL-rewriting is a form of NAT on the URL, but with far worse
side-effects than IP-layer NAT and is often a sign of major design
mistakes somewhere in the network. Why do you have to re-write in the
first place? perhapse we could point you at a simpler more standards
compliant setup.

Amos

Thanks Amos. This makes things even clearer. Actually, I'd say that my
problem is solved with the help of both of you. But well, let's have a
look on my design.

My goal is to build up an access control mechanism for my client
machines to the internet. As long as a user has not yet logged in, his
client box should be completely cut off the internet, not only HTTP.

The login is done by a web interface. This is where I redirect the URL
rewriting for any web traffic. After the user has logged in, the
client's HTTP packets will be DNATed to the other squid port in order to
be regularly proxied. I need the HTTP proxy for logging my users' HTTP
requests.

Since the users' client machines are out of my control, it is important
for me that they don't need any special configuration, That's why the
squid must run in transparent mode.

Remark: I'm about to leave for one week of holidays. Thus, I probably
won't be able to respond before end of next week.

Hans

--
Eliezer Croitoru
https://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer <at> ngtech.co.il