Squid as Content Accelerator with spoofing of outbound connections ?

"Darryl L. Miles" <darryl-mailinglists@xxxxxxxxxxxx> · Mon, 18 Jun 2007 19:11:24 +0100

Squid users,

Is it possible to use Squid as a reverse proxy (Content Accelerator) and 
have the outbound request to the backend server spoof the original 
client IP ?

I am using Linux and am already familiar with "Policy Routing".  Given a 
squid host with 2 physical ethernet cards that are different interfaces, 
it is possible to ensure TCP reply packet go out of the correct 
interface.  So the networking logistics I have covered (more details 
below for those keenly interested).

This would allow a public IP client address lets say 1.2.3.4 to be 
visible from the HTTP Webserver when the requests come in, rather than 
the IP address of the auto-bound interface facing the webserver.

What is unclear is if squid/linux can be setup to allow squid to pick 
the client IP address it wants to be using the bind() system call, so 
that the IP can be that of the original request into squid.

Another alternative to this would be to employ something like Apache 
JServ, which is currently used as a Java/JSP connector and allows for 
proxying and have Squid speak this protocol.

My fictitious setup would be like this:

Public Internet      Load Balancer by NAT    Squid Accelerator     HTTP Webserver
1.2.3.4          ->  6.7.8.9              -> 10.1.0.1:8080      => 10.2.0.1:80

Where:

1.2.3.4 is the member of the public's IP address of his HTTP client
6.7.8.9 is the IP address of the webserver as resolved by DNS "A" record for "www.mydomain.com"
10.1.0.1 is the IP address of the squid accelerator public facing side
10.2.0.1 is the IP address of the backend webserver

For arguments sake:

10.1.0.2 is the IP address of the squid accelerator webserver facing side.

What happens:

1) A public HTTP client makes a connection request to www.mydomain.com 
port 80, this resolves to
6.7.8.9.

2) The TCP connection packet arrives at the hosting setup and a "Load 
Balancer by NAT" is the only equipment setup on this IP. This is usually 
called a VIP (Virtual IP).

3) When the "Load Balancer by NAT" receives the packet it allows load 
balancing to take place by looking to see what workers are active. In 
the case of the simple setup above only one worker is defined, that is 
at 10.1.0.1:8080.  So the "Load Balancer by NAT" performs Network 
Address Translation of the incoming packet so that the destination IP 
address and port number are re-written to now be 10.1.0.1:8080 and the 
same packet then continues to be routed on this basis.

4) The packet arrives at the Squid Accelerator host, because the IP 
address 10.1.0.1 is a local address (the one of the eth0 interface).  
Port 8080 is listening and its Squid that is listening.  From this point 
on squid accepts the connection and starts to read the request data.  
Squid sees the original client IP from getpeername() system call on the 
socket.

5) Squid after reading all the HTTP request data and checking its local 
cache decides it needs to contact a backend webserver to receive the 
content to satisfy this request.

*** This is the interesting part I am really asking is Squid can support ***

6) Normally squid will open a regular TCP socket (possibly bound to a 
specific single local IP, as per squid.conf) and then issue a connection 
request to 10.2.0.1:80, which will cause the packet to go out of eth1 
(squid interface facing the webserver).   Except what I want it to do 
here is to bind as the original client IP address and issue the request 
so that it effectively faked/spoofed the client IP.

7) The webserver gets this inbound packet and processes as normal.  I 
want the original client IP address to be visible here with 
getpeername() system call, but at this time this ends up being 10.1.0.2 
(the IP address of squid's interface facing the webserver).

I realize this might have implications on connection reuse between squid 
<> webserver, but then so does authentication, session cookie affinity, 
SSL session affinity, etc...

Linux has a "echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind" while might 
provide part of what is needed to achieve this.

There is just one issue that a keen eye might spot, in that how does the 
HTTP webserver know which squid proxy to route the traffic back via ?  
This presumes in any larger setup there could be 1+ squid accelerators 
and 1+ HTTP webservers.  Well the answer to that would be to use 
different port or IP addresses for each squid, then have a "policy 
route" at the webserver which picks up on this difference and defines a 
different "default route via 10.1.0.2" so packets always flows back in 
the right direction back to the correct squid instance.

The main issue with Squid's feature support to be able to auto-spoof the 
client IP when taking to the backend.

Thanks for reading,

Darryl