Hi Amos, thanks for your response!
Be careful, very very careful. ESPN is the example of the month for doing
this badly. Their site refuses to open for anyone browsing from a host with
local proxy installed.
Thanks for the warning. I'll try it with a proxy server before putting
this into production.
I could configure the Squid's IP address on Apache. But this is
undesirable, because Squid is running on EC2, its IP may change, and
further EC2 instances can come and go.
The ONLY way to trust its contents is to verify that the listed content is
correct, individually IP by IP starting with the machine directly
connecting to supply it. If you can't track exactly where the proxy *is* in
cyberspace then you cannot trust anything sent. XFF ACL tests will accept
any ACL criteria that version of Squid can take.
I do not fully understand what you mean here, but I'll try to answer anyway:
All the reverse proxy servers will be run by us, so I consider them
trusted. But since Amazon does not provide availability guarantees or
stable IP addresses and load may change, we might have to add or remove
instances on the fly. I'd like to have a system that can handle this in
a robust way (reconfiguring Apache doesn't count as robust, too much
room for errors).
Better, Squid can send Basic auth login credentials in the
Proxy_Authentication: header. squid-3.2 adds Negotiate auth protocol to
this for more secure logins.
Hmm, that sounds like exactly what I was looking for! Why couldn't I
figure this out by myself...?
Noting that XFF contains a ', ' delimited list. These rules may not work
as intended.
I'm using the
header_replace X-Forwarded-For
directive. If I understood it correctly, it should clear any existing
X-Forwarded-For headers, and Squid has no reason to add more than one IP
here, so I think it should be fine.
NP: when using a proxy a large portion of the traffic will never even
reach the web server. At this point only the squid logs are a true record
of the visitor traffic.
Very good point, I didn't think about that. I guess we'll have to figure
out what the requirements for the log files are...
Squid provides syslog facility to push log lines out of the cloud proxy
and back to a central server for processing.
So far I hesitated to implement this, because it sounds like it uses a
lot of bandwidth, isn't that easy to set up in a secure way and maybe
isn't that robust. But sooner or later I'll have to give it a try...
Regards,
David