Search squid archive

Squid as Content Accelerator with spoofing of outbound connections ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Squid users,

Is it possible to use Squid as a reverse proxy (Content Accelerator) and have the outbound request to the backend server spoof the original client IP ?

I am using Linux and am already familiar with "Policy Routing". Given a squid host with 2 physical ethernet cards that are different interfaces, it is possible to ensure TCP reply packet go out of the correct interface. So the networking logistics I have covered (more details below for those keenly interested).

This would allow a public IP client address lets say 1.2.3.4 to be visible from the HTTP Webserver when the requests come in, rather than the IP address of the auto-bound interface facing the webserver.

What is unclear is if squid/linux can be setup to allow squid to pick the client IP address it wants to be using the bind() system call, so that the IP can be that of the original request into squid.

Another alternative to this would be to employ something like Apache JServ, which is currently used as a Java/JSP connector and allows for proxying and have Squid speak this protocol.


My fictitious setup would be like this:


Public Internet      Load Balancer by NAT    Squid Accelerator     HTTP Webserver
1.2.3.4          ->  6.7.8.9              -> 10.1.0.1:8080      => 10.2.0.1:80


Where:

1.2.3.4 is the member of the public's IP address of his HTTP client
6.7.8.9 is the IP address of the webserver as resolved by DNS "A" record for "www.mydomain.com"
10.1.0.1 is the IP address of the squid accelerator public facing side
10.2.0.1 is the IP address of the backend webserver

For arguments sake:

10.1.0.2 is the IP address of the squid accelerator webserver facing side.



What happens:

1) A public HTTP client makes a connection request to www.mydomain.com port 80, this resolves to
6.7.8.9.

2) The TCP connection packet arrives at the hosting setup and a "Load Balancer by NAT" is the only equipment setup on this IP. This is usually called a VIP (Virtual IP).

3) When the "Load Balancer by NAT" receives the packet it allows load balancing to take place by looking to see what workers are active. In the case of the simple setup above only one worker is defined, that is at 10.1.0.1:8080. So the "Load Balancer by NAT" performs Network Address Translation of the incoming packet so that the destination IP address and port number are re-written to now be 10.1.0.1:8080 and the same packet then continues to be routed on this basis.

4) The packet arrives at the Squid Accelerator host, because the IP address 10.1.0.1 is a local address (the one of the eth0 interface). Port 8080 is listening and its Squid that is listening. From this point on squid accepts the connection and starts to read the request data. Squid sees the original client IP from getpeername() system call on the socket.

5) Squid after reading all the HTTP request data and checking its local cache decides it needs to contact a backend webserver to receive the content to satisfy this request.

*** This is the interesting part I am really asking is Squid can support ***

6) Normally squid will open a regular TCP socket (possibly bound to a specific single local IP, as per squid.conf) and then issue a connection request to 10.2.0.1:80, which will cause the packet to go out of eth1 (squid interface facing the webserver). Except what I want it to do here is to bind as the original client IP address and issue the request so that it effectively faked/spoofed the client IP.

7) The webserver gets this inbound packet and processes as normal. I want the original client IP address to be visible here with getpeername() system call, but at this time this ends up being 10.1.0.2 (the IP address of squid's interface facing the webserver).


I realize this might have implications on connection reuse between squid <> webserver, but then so does authentication, session cookie affinity, SSL session affinity, etc...

Linux has a "echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind" while might provide part of what is needed to achieve this.

There is just one issue that a keen eye might spot, in that how does the HTTP webserver know which squid proxy to route the traffic back via ? This presumes in any larger setup there could be 1+ squid accelerators and 1+ HTTP webservers. Well the answer to that would be to use different port or IP addresses for each squid, then have a "policy route" at the webserver which picks up on this difference and defines a different "default route via 10.1.0.2" so packets always flows back in the right direction back to the correct squid instance.

The main issue with Squid's feature support to be able to auto-spoof the client IP when taking to the backend.


Thanks for reading,

Darryl


[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux