Squid users,
Is it possible to use Squid as a reverse proxy (Content Accelerator) and
have the outbound request to the backend server spoof the original
client IP ?
I am using Linux and am already familiar with "Policy Routing". Given a
squid host with 2 physical ethernet cards that are different interfaces,
it is possible to ensure TCP reply packet go out of the correct
interface. So the networking logistics I have covered (more details
below for those keenly interested).
This would allow a public IP client address lets say 1.2.3.4 to be
visible from the HTTP Webserver when the requests come in, rather than
the IP address of the auto-bound interface facing the webserver.
What is unclear is if squid/linux can be setup to allow squid to pick
the client IP address it wants to be using the bind() system call, so
that the IP can be that of the original request into squid.
Another alternative to this would be to employ something like Apache
JServ, which is currently used as a Java/JSP connector and allows for
proxying and have Squid speak this protocol.
My fictitious setup would be like this:
Public Internet Load Balancer by NAT Squid Accelerator HTTP Webserver
1.2.3.4 -> 6.7.8.9 -> 10.1.0.1:8080 => 10.2.0.1:80
Where:
1.2.3.4 is the member of the public's IP address of his HTTP client
6.7.8.9 is the IP address of the webserver as resolved by DNS "A" record for "www.mydomain.com"
10.1.0.1 is the IP address of the squid accelerator public facing side
10.2.0.1 is the IP address of the backend webserver
For arguments sake:
10.1.0.2 is the IP address of the squid accelerator webserver facing side.
What happens:
1) A public HTTP client makes a connection request to www.mydomain.com
port 80, this resolves to
6.7.8.9.
2) The TCP connection packet arrives at the hosting setup and a "Load
Balancer by NAT" is the only equipment setup on this IP. This is usually
called a VIP (Virtual IP).
3) When the "Load Balancer by NAT" receives the packet it allows load
balancing to take place by looking to see what workers are active. In
the case of the simple setup above only one worker is defined, that is
at 10.1.0.1:8080. So the "Load Balancer by NAT" performs Network
Address Translation of the incoming packet so that the destination IP
address and port number are re-written to now be 10.1.0.1:8080 and the
same packet then continues to be routed on this basis.
4) The packet arrives at the Squid Accelerator host, because the IP
address 10.1.0.1 is a local address (the one of the eth0 interface).
Port 8080 is listening and its Squid that is listening. From this point
on squid accepts the connection and starts to read the request data.
Squid sees the original client IP from getpeername() system call on the
socket.
5) Squid after reading all the HTTP request data and checking its local
cache decides it needs to contact a backend webserver to receive the
content to satisfy this request.
*** This is the interesting part I am really asking is Squid can support ***
6) Normally squid will open a regular TCP socket (possibly bound to a
specific single local IP, as per squid.conf) and then issue a connection
request to 10.2.0.1:80, which will cause the packet to go out of eth1
(squid interface facing the webserver). Except what I want it to do
here is to bind as the original client IP address and issue the request
so that it effectively faked/spoofed the client IP.
7) The webserver gets this inbound packet and processes as normal. I
want the original client IP address to be visible here with
getpeername() system call, but at this time this ends up being 10.1.0.2
(the IP address of squid's interface facing the webserver).
I realize this might have implications on connection reuse between squid
<> webserver, but then so does authentication, session cookie affinity,
SSL session affinity, etc...
Linux has a "echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind" while might
provide part of what is needed to achieve this.
There is just one issue that a keen eye might spot, in that how does the
HTTP webserver know which squid proxy to route the traffic back via ?
This presumes in any larger setup there could be 1+ squid accelerators
and 1+ HTTP webservers. Well the answer to that would be to use
different port or IP addresses for each squid, then have a "policy
route" at the webserver which picks up on this difference and defines a
different "default route via 10.1.0.2" so packets always flows back in
the right direction back to the correct squid instance.
The main issue with Squid's feature support to be able to auto-spoof the
client IP when taking to the backend.
Thanks for reading,
Darryl