Re: Caching issue with http_port when running in transparent mode

"Hans Musil" <hans.musil@xxxxxx> · Tue, 05 Jun 2012 21:04:19 +0200

-------- Original-Nachricht --------
> Datum: Tue, 05 Jun 2012 19:54:12 +0200
> Von: "Hans Musil" <hans.musil@xxxxxx>
> An: Amos Jeffries <squid3@xxxxxxxxxxxxx>, squid-users@xxxxxxxxxxxxxxx
> Betreff: Re:  Caching issue with http_port when running in transparent mode

> Amos Jeffries wrote:
> 
> > On 29/05/2012 6:12 p.m., Hans Musil wrote:
> > > Amos Jeffries wrote:
> > >> On 29.05.2012 08:13, Eliezer Croitoru wrote:
> > >>> hey there Hans,
> > >>>
> > >>> are you serving squid on the same machine as the gateway is?(wasnt
> > >>> sure about the DNAT).
> > >>> your problem is not directly related to squid but to the way that
> tcp
> > >>> and browsers works.
> > >>> for every connection that the client browser uses exist a tcp
> windows
> > >>> that stays alive for a period of time after the page was served.
> > >>> this will cause to all the connections that was served using port
> > >>> 3128 to still exist for i think 5 till 10 more minutes or whatever
> is
> > >>> your tcp stack settings.
> > >>
> > >> While that is true for the TCP details I think HTTP connection 
> > >> behaviour is why that matters. For the TCP timeouts closure to start 
> > >> happening HTTP has to first stop using the connection.
> > >>
> > >> iptables NAT only affects SYN packets (ie new connections). So any 
> > >> existing TCP connections made by HTTP WILL continue to operate 
> > >> despite any changes to NAT rules.
> > >>
> > >> HTTP persistent connections, CONNECT tunnels and HTTP 
> > >> "streaming"/large objects have no fixed lifetime and several minutes 
> > >> for idle timeout. It is quite common to see client TCP connections 
> > >> lasting whole hours or days with HTTP traffic flow throughout.
> > >>
> > >>>
> > >>> On 28/05/2012 22:34, Hans Musil wrote:
> > >>>> Hi,
> > >>>>
> > >>>> my box is running on Debian Sqeeze, which uses SQUID version 
> > >>>> 2.7.STABLE9, but my problem also seems to affect SQUID version 3.1.
> > >>>>
> > >>>> These are the importend lines from my squid.conf:
> > >>>>
> > >>>> http_port 3128 transparent
> > >>>> http_port 3129 transparent
> > >>>> url_rewrite_program /etc/squid/url_rewrite.php
> > >>>>
> > >>>>
> > >>>> First, I did configure my Linux iptables like this:
> > >>>>
> > >>>> # Generated by iptables-save v1.4.8 on Mon May 28 21:04:09 2012
> > >>>> *nat
> > >>>> :PREROUTING ACCEPT [0:0]
> > >>>> :POSTROUTING ACCEPT [0:0]
> > >>>> :OUTPUT ACCEPT [0:0]
> > >>>> -A PREROUTING -i eth1 -p tcp -m tcp --dport 80 -j DNAT 
> > >>>> --to-destination 10.17.0.1:3128
> > >>>> COMMIT
> > >>>>
> > >>>> and everything works fine.
> > >>>>
> > >>>> But when I change the redirect port in the iptables settings from 
> > >>>> 3128 to 3129, Squid behaves strange: My URL rewrite program still 
> > >>>> gets send myport=3128, althought there is definitely no more 
> > >>>> request on this port, but only on 3129. This only affects HTTP 
> > >>>> domains that already have been requested before, i.e. with 
> > >>>> redirection to port 3128, and it works fine again when I do a 
> > >>>> force-reload on my browser. Also, things turn well when waiting 
> > >>>> some minutes.
> > >>>>
> > >>>> I suppose there is some strange caching inside Squid that maps the 
> > >>>> HTTP domain to an incoming port.
> > >>
> > >> No. There is only an active TCP connection. Multiple HTTP request can
> > >> arrive on the connection long after you start sending unrelated new 
> > >> connections+requests through other ports.
> > >>
> > >>
> > >> What your helper was passed is the details about the request Squid 
> > >> received. It arrived on a TCP connection which was accepted through 
> > >> Squid port 3128. The fact that you changed the kernel settings after 
> > >> that connection was setup and operating is irrelevant.
> > >>
> > >>
> > >> URL-rewriting is a form of NAT on the URL, but with far worse 
> > >> side-effects than IP-layer NAT and is often a sign of major design 
> > >> mistakes somewhere in the network. Why do you have to re-write in the
> > >> first place? perhapse we could point you at a simpler more standards 
> > >> compliant setup.
> > >>
> > >> Amos
> > >>
> > > Thanks Amos. This makes things even clearer. Actually, I'd say that my
> > > problem is solved with the help of both of you. But well, let's have a
> > > look on my design.
> > >
> > > My goal is to build up an access control mechanism for my client 
> > > machines to the internet. As long as a user has not yet logged in, his
> > > client box should be completely cut off the internet, not only HTTP.
> > >
> > > The login is done by a web interface. This is where I redirect the URL
> > > rewriting for any web traffic. After the user has logged in, the 
> > > client's HTTP packets will be DNATed to the other squid port in order 
> > > to be regularly proxied. I need the HTTP proxy for logging my users' 
> > > HTTP requests.
> > >
> > > Since the users' client machines are out of my control, it is 
> > > important for me that they don't need any special configuration, 
> > > That's why the squid must run in transparent mode.
> > 
> > Okay. As expected a design problem. The huge problem with transparent 
> > intercept is that the browser is 100% unaware that the proxy exists. As 
> > far as it is concerned the re-written splash page or redirect response 
> > is the actual response to somebody elses domain name (google or your 
> > bank for example). It has zero reason to think that a new TCP connection
> > is needed for followup requests. Just because the server of that page 
> > replied Connection:close is no reason to expect Squid to pass the 
> > closure on to the client (quite the reverse, Squid will go out of its 
> > way to keep client connections open and re-used).
> > 
> > 
> > To fit in with your existing config that would be:
> > 
> >   acl port3128 myportname 3128
> >   deny_info http://your-login.example.com/ port3128
> >   http_access deny port3128
> > 
> > The full details and some other tricks can be found at 
> > http://wiki.squid-cache.org/ConfigExamples/Portal/Splash
> > 
> > This still hits the DNAT problems. I would suggest finding an 
> > external_acl_type helper that accesses whatever database your login 
> > script is recording client logins with. Using that as the ACL to deny / 
> > bounce new clients to the login page. With that design you can authorize
> > a client on their initial request and continue using the connection 
> > afterwards.
> > 
> > NP: I recenty posted to the list a version of the external_acl_type 
> > helper I use myself for exactly this type of portal setup.
> > 
> > Amos
> 
> Amos, I'm back. Thanks for your last posting.
> 
> Your trick with acl, deny_info and http_access was a big help.
> 
> As far as I understand, the external_acl_type helper needs to decide every
> few seconds whether a client is logged in or not. With some hundreds of
> clients, this means hundreds of database lookups per second. That's what I
> wanted to avoid by flipping the squid port when a user logs in or out,
> respectively. This way, I only have one iptables rule instead of multiple DB
> lookups.
> 
> As far as the DNAT problem, I consider to simply run a "contrack -D" with
> appropriate -s and -d options from my login/logout script. 
> 
> Hans

Ups, an other problem: Amos, your solution looks fine, but there is one problem. My login/logout script needs to know the client's IP, but it only sees my squid's IP. I know, there is format tag %i, but this would require the non-stable version 3.2. Any better idea?

Hans
-- 
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a